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OVERVIEW 



1.1 Introduction 

The Meiko ELAN communications processor is the interface between processing 
nodes and the Meiko packet switched network. This network is constructed from 
switch components that are capable of switching eight bidirectional 55MBytes/s 
communications channels. Basic packets arc between 40 and 320 bytes long, and are 
routed according to headers that are attached by the communications processor. The 
switching network supports hardware ACK/NACK for each packet, and implements 
a lower level protocol for flow control at the byte level. The network allows anywhere 
to anywhere connectivity, as well as broadcast across selected processor ranges. 

1.2 Major Objectives 

The major objectives for the communications processor are: 

To process in-coming packets from the network as quickly as possible. 

To ensure reliable message delivery. 

To minimise the number of interrupts to the main processor. 

To reduce latency during communication start-up. 

To remove the need for multiple lightweight processes on the main processor. 

To utilise main store bandwidth efficiently, 

To maintain security between unrelated operations. 

To filter out erroneous transmissions before they are transferred onto the network. 
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13 Processing Units 

The ELAN consists of six individual processing units. In the initial implementation 
these are all implemented on the same microcoded engine, but future implementations 
might use different strategies. 

The six units are: 

• Input processors — two of these handle incoming packets from the network. 

• Thread processor — executes short code fragments to handle checking and 
outputting of protocols. 

• DMA processor — executes store-to-store DMAs, and store-to-remote DMAs. 

• Reply processor — returns results of remote reads. 

• Command processor — executes commands initiated from the main processor. 

Three of the six processing units are capable of outputting packets into the network; 
these are the thread, output, and reply processors. The reply and dma processors 
construct packets that are appropriate to their function; that is, dma packets consist 
of block read or write transactions, and reply packets consist of word write and event 
transactions. The thread processor constructs packets using its instruction and data 
stream, and is the only output processor that can construct arbitrary packets. 

All three processors must share a single resource, called the outputter, for transmitting 
their packets onto the network. 

All of these processors are capable of virtual operation and can execute in any of 64K 
contexts. All processors can be considered as executing from queues, and in some 
cases can cause work to be added to other queues; an input processor, for example, 
may cause a reply to be added to the reply processors queue. 

13.1 Input Processor 

The input processor is defined by the network protocols, as described in B. The 
characteristic of the packet protocol is that each packet contains all the state 
information that is needed to process them. Communications are essentially by 
remote write operations, unlike transputer communications where there is assumed 
to be a cooperating process at the receiving end. The input processor is therefore 
stateless, and takes its context from the incoming packet stream. 
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132 Thread Processor 



The thread processor is a small RISC processor with lightweight scheduling 
capabilities similar to the transputer. A number of special instructions give access 
to die output port and onto the network. Although the output processor is capable of 
running complied C programs (at a performance of about 3 to 4 MIPS), it should not 
be used to offload computation from the main processor, its primary aim is to handle 
the protocols of the input and output messages, thus ensuring that the main processor 
is only interrupted when it is really necessary. 



13 3 DMA Processor 



The DMA processor performs DMAs either locally (store-to-store), or across the 
network. This processor converts the data into packet format, and ensures that 
processes are woken-up when the DMA is complete. 



1.3.4 Reply Processor 



One very important function is the remote reading or read-modify-writing of memory 
for program and lock control. These are one or two word transactions which require 
an inputing process to generate an output packet To help achieve some decoupling 
between the inputter and outputer for these operations a queue of replies is used. The 
reply processor works on this queue and generates correct output packets. 



1.3.5 Command Processor 



The command processor executes commands that are issued by the main processor. 
The interface between the command processor and the main processor has been 
designed so that communications can be initiated by a simple series of writes from 
the main processor. 

1.4 Elan Implementation 

The initial implementation of ELAN is a single CMOS sea of gates ASIC with: 
Single microcoded engine. 

72 words x 64 bits dual ported memory, 640Mbyte/s bandwidth. 
Communications MMU. 

Two Meiko byte wide channels permitting redundant networks. 
Full level 2 Mbus interface. 
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• Approximately 45K gates, excluding ROM and RAM. 

• 208 pin PGA packages. 

1.5 Using This Reference Manual 

This section provides information to help you use this manual It includes an 
overview of the manual, a definition of the intended audience, a description of the 
fonts used and what they mean. 

1.5.1 Contents 

The chapter after this describes the virtual process model used in the ELAN 
architecture. The following five chapters describe each of the five types of processor 
on the ELAN processor itself. The final chapter explains the event mechanism and 
exceptions. The appendices describe in detail the registers and memory maps used 
by the ELAN processor. 

1.5.2 Audience 

This document is intended to be read by people requiring a detailed insight into the 
operation and capabilities of the ELAN communications processor. In addition the 
appendices are sufficiently detailed to enable ELAN device drivers to be written. 
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The ELAN contains its own memory management unit (MMU). Any process or work 
item has a hardware context associated with it, and this is used by the MMU to 
interpret virtual addresses. 

Within the ELAN network, each process has a virtual process number. This allows 
other processes to reference it by virtual process number. Communication consists of 
sending transactions to a virtual process, the destination process number is translated 
by a local process mapping table that is specific to each process. This table is called 
the virtual process table. 

The translation of virtual to physical addresses via context dependent tables provides 
security and portability. Portability is facilitated because the mapping may be 
changed at any time. Security is provided, because a process can only communicate 
with those processes that are explicitly mapped in the table. 

Generally the MMU and mapping functions will be under the control of the main 
processor. This means that only areas of store which have been specifically authorised 
for communications can be altered. Under kernel control, messages and remote writes 
can be enabled direct into a user spaces without further main CPU intervention. 



2.1 The Virtual Process Model 



The ELAN virtual process model has been developed to provide usable lightweight 
process communication within the familiar UNIX style programming model. The 
model that is used by ELAN incorporates the protection and security of UNIX, with 
much reduced process switch times. 
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2.1.1 Lightweight Processes 



The lightweight processes are provided by hardware assisted scheduling. Many 
threads of execution may reside within a virtual process. Within a process, threads are 
not protected from one another, although processes are protected from one another, so 
that one cannot damage the execution (memory map) of any other. This is analogous 
to UNIX, where processes are protected from each other, although the reliability of 
any one process is entirely dependent on the programmer that created it 

Processes communicate by reading and writing between each other's memory spaces 
using network transactions or DMAs. Higher level signalling and locking are 
provided by a the virtual event mechanism. 



2.1.2 Programming Model - Events 



The Elan model is principally based upon the communicating process model of 
parallelism. Several paradigms exist here, the most notable being Communicating 
Sequential Processes (CSP) or, in its more developed form, OCCAM. The CSP 
concept of synchronisation turns out to be too restrictive for many applications; the 
ELAN model therefore uses a much looser but more powerful mechanism known as 
events. 

An event is a memory location with specific contents defining the state of the event 
Events act like user defined interrupts. Processes can non-busily wait on an event, 
poll events or cause events. 

2.1.2.1 Queued Events 

Queued events are allowed where several processes may wait on an event queue. As 
events occur a waiting process will be woken off the top of the queue. This facilitates 
the writing of secure and efficient resource sharing code. 

When queued events are used, only the access to the dual-word event location is 
treated as an event access; queue handling must be done with local permissions. In 
this way protected event vectors can provide a signal mechanism across the network, 
with similar security to a standard trap interface. 
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2.13 Protection 



The virtual processes are protected from each other by hardware contexts in a MMU. 
Threads within processes do not require such protection; a thread within its process 
has the same security as the process itself. A virtual process will normally be acting 
on behalf of a larger more weighty applications process. This may have a number of 
co-operating virtual processes providing it with different services or functions. The 
applications process will not share the same hardware context as a virtual process but 
instead will have areas of overlapping memory spaces pertinent to the co-operation 
at hand. 

One major problem with a networked system is security; security can usually 
be provided but at substantial overhead in communication startup. The ELAN 
communications processor overcomes this by incorporating a paged MMU for all 
memory accesses. This MMU is appropriate for implementing standard UNIX 
kernels and has additional access types to provide protection to remote and event 
accesses. This allows different permissions to be set up for areas of memory accessed 
by other processes. 

Programmers information for the MMU is given in Appendix H. 



2.2 Network Protocol 



The network protocol is accessed from a virtual process by sending packets to other 
processes. These are groups of transactions which alter the memory and scheduling 
state of the destination process. A packet is sent by executing an OPEN instruction 
and by specifying the recipient's virtual process number. A number of transactions 
are then sent by use of the SENDJTRANS instruction, and the packet is terminated 
by use of the CLOSE instruction. 



2.3 Virtual Process Table 



To allow control over the virtual processes with which a process can communicate, 
a table (called the Virtual Process table) is used. These tables reside in physically 
mapped store and aren't accessible directly by a virtual process. Each virtual process 
has a different table. The address of the virtual table base is held in a context control 
block which is loaded as required for each change of context In addition to the 
'permission' aspect of the table a translation is made of virtual process number to 
physical processor number and hardware context 

The structures that are used to define the virtual process table are denned in Appendix 
A. 
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2.4 Route Table 



After converting a virtual process identifier to a physical processor number, a 
transmission route for the communication is obtained from the route table. The base 
of the route table is defined by a register within the communications processor. 

2.4.1 Routing Strategy 

The routing strategy can be altered through the contents of the routing table. This 
table contains four routes for each destination processor one of which is selected 
randomly for each packet. Each route is defined by a 16 byte entry, any of which may 
be the same. The random selection of one-from-four routes is intended to provide 
some degree of protection from congestion within the network. Random selection 
has better worst case performance than other more active forms of communications 
load balancing while being simple to implement 

2.4.2 Broadcast Communications - - 

The ELAN allows broadcast communications to a contiguous array of ELITE links. 
To allow one or more of the destination processes to be omitted from this contiguous 
array, the MMU is programmed to ignore the incoming messages. 

2.43 Route Table Modification 

To stop processors using a route the destination processor has to be removed from 
the virtual process tables. If a route has to be removed quickly it can be done so by 
giving NULL entries by zeroing the first of each route in the route table. An attempt 
to use a NULL route causes the outputting processor to trap. 

The table can be placed at the same physical location within node's store to enable 
easy modification of all tables in the network, via a broadcast command. 
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The input processor receives packets from the network. Its objective is to remove 
them and process them as quickly as possible to avoid network congestion. 



3.1 Network Packet Protocol 



The network byte order is big endian, although communication processors accessing 
the network may have little endian memory systems provided that they put data onto 
the network in die network byte order. 

A packet is composed of route information, a start-of-packet (SOP) delimiter, a 
number of transactions, and an end-of-packet (EOP) delimiter. Route data is added 
to the head of the packet by the sender, and is stripped off as it is used in the network. 
An inputter removes any extra route information that has not been been stripped by 
the network up to the SOP. 

packet = route bytes, SOP, transaction, transactions, > EOP. 

3.1.1 Transaction Format 

Each transaction within the packet consists of one or more 64-bit words, followed by a 
16-bit cyclic redundancy check (CRQ. The first double word contains a transaction 
type, a context, and an address. Transactions can always be handled immediately 
they are received, and contain all the information that is necessary to execute them. 
In particular, a transaction can always be handled without doing any further network 
operations; this avoids cyclic dependencies. 

The transaction executed are by the ELAN input registers are detailed in Appendix 
B "Input processor". 
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3.1.2 Packet Acknowledge and Negative Acknowledge 

Where a packet consists of a number of transactions, the transactions are executed in 
the order they arrive. Each packet will be either acknowledged (ACK) or negative 
acknowledged (NACK). The point at which the ACK is sent in a packet is determined 
by the AckNow bit (Appendix B). If a transaction has its AckNow bit set, and has a 
valid CRC, then an ACK will be sent 

If the receiver returns an ACK then all transactions up to and including the transaction 
that generated the ACK will have been received and processed. If the sender receives 
a NACK, this means the receiving input was unable to send an ACK. The reasons for 
the input being unable to send an ACK could be one of the following. 

a) an input was busy 

b) a CRC error was detected 

c) a network conditional failed 

Where an input processing device has multiple channels, Elan 1.0 has two, then 
transactions after an ACK on one channel will be processed before any further 
transactions on the other channel. This ensures that the order in which ACKs are 
sent corresponds to the order in which transactions are processed. Devices outputing 
onto the network should not send another packet until they have received an ACK 
or NACK for the current one so as to ensure that packets arrive in the correct order, 
irrespective of the route they take. A packet must contain at least one transaction 
where the ackNow bit is set NACK may be sent at any time on a multiple transaction 
packet unless an ACK has been sent NACK means that at least one transaction of 
the multiple transactions in a packet has not occurred. When a circumstance arises 
where the inputter would have sent a NACK if an ACK had not already been sent 
(in particular a CRC error in a transaction after ackNow), that transaction, and all the 
remaining transactions in the packet, have to be trapped to an error buffer. 

Asserting AckNow on the first of a group of transactions allows an overlapping of 
acknowledges so as to preserve network bandwidth where network latency is of the 
order of a transaction time. 

Input SOP Tl T2 T3 T4 EOP SOP Tl 
Output ACK ACK 
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3.2 Network Conditionals 

Network conditionals provide a way of testing the value of an object across a network. 
If the condition is false a NACK is sent and the rest of the packet is discarded. If 
the condition is true the next transaction will be executed. This permits conditional 
writing of blocks of data, where a flag value can be tested, and a number of speculative 
writes in a packet will be performed only if the flag is set Multiple conditional 
transactions are permitted in one packet, the effect being of a sequentially evaluated 
AND of the conditions. Only one ACK or NACK per packet may be sent 

Note that as NACK is also used to signal that a packet has been rejected for some 
reason, ( bad CRC etc), receiving a NACK from a network conditional does not mean 
that the logical complement is true. ACK means that the conditional was performed 
and was true, NACK means the converse (not performed OR not true). 

Network conditionals are of particular value when used with broadcast to 
synchronise large numbers rof processors." For example barrier synchronisation can 
be achieved by doing a broadcast trJEQ, to check that all processors in a group have 
set a flag. 

33 Context Filter 

The inputter has a context filter register, which if it matches the input packet context 
will NACK the packet automatically. This is intended to be used to filter out 
communications to a particular hardware context Its use is detailed in Appendix 
B on the Input processor. 



3.4 Broadcast 



The network is capable of broadcasting to any contiguous range of processors. 
Any packet may be broadcast The ACK and NACK signals from the various 
processors are recombined in the network. All components must respond. The packet 
transmitter sees an ACK when all components have sent an ACK, or a NACK when 
all components have responded and at least one has responded with a NACK. 

Receiving a NACK does not mean a packet has not occurred on any of the inputters. 
The only statement that can be made is that an ACK means that the transaction has 
been successfully received on all components of the broadcast When conditional 
execution of transactions is employed it should be remembered that each inputter 
sees its own ACK or NACK value, not the value returned to the outputter. 
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3.5 Exceptions 

Exceptions that can be generated by an input processor are: 



DATAJVCCESS__EXCEPTION 

EVENT__QUEUEJ}VERFLOW 

EVENT_INTERRUPT 

QUEUEJDVERFLOW 

UNIMPLEMENTED 

MEM_ADDRESS_NOT_ALIGNED 

Input specific exceptions: 

IPROC_NACK_AFTER_ACK 

If the inputter tries to send a N ACK when it has already set an ACK was sent then an 
IPROC_NACK_AFmR_ACK exception is taken. 
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The thread processor executes virtual processes and threads, which can be generated 
from compiled C source code. The thread processor has two basic types of 
instruction: local instructions execute as normal instructions on a register based 
machine; network instructions are used to create network transactions that normally 
execute on a remote input processor. 

Processes that execute within the thread processor are virtual processes, and these are 
described in detail in chapter 2 

4.1 Registers 

The thread processor has two types of registers associated with it; working registers 
(r registers) and control/status registers. Working registers are used for normal 
operations and control/status registers keep track of and control the state of the 
processor. 

4.1.1 r registers 

All r registers are 32 bits wide. They are divided into a zero register and 8 in registers 
and 8 out registers. 

register number Name 

r[24] to r[31] Ins 

r[16] to r{23] Unimplemented 

r[8] to r[15] Outs 

r[l] to r[7] Unimplemented 

r[0] Zero 
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4.2 Network Instructions 

Network instructions are used to generate packets containing transactions. 

4.2.1 OPEN, SENDTRANS, and CLOSE 

The remote instructions OPEN and CLOSE are used to delimit a network packet; 
OPEN defines the virtual processor to which the packet is destined, and CLOSE 
sends the EndOf Packet signal after an acknowledge has been received. Between 
these delimiters each remote instruction translates into a single network transaction. 
Local instructions may execute between OPEN and CLOSE, but remote instructions 
executed outside of the delimiters will cause a trap. 

The OPEN instruction does the translation of the destination virtual processor number 
into a physical processor number and context It also fetches the route bytes for that 
destination processor and assembles them into a StartOf Packet. This will begin 
to make its way to the destination processor. The OPEN instruction may fail if the 
process cannot be translated, in which case it will trap. 

Following an OPEN instruction, the SENDTRANS command is used to send 
transactions. 

The CLOSE instruction ensures that an acknowledge has occurred, and waits if it 
has not When an acknowledge has occurred an end of packet signal (EOP) is sent 
The received acknowledge is examined, and a value of is returned if a NACK was 
received, and a value of 1 if it was acknowledged by an ACK. 

423, Time Out 

The period between an OPEN and a CLOSE has to be restricted by a timeout 
mechanism. During the execution of an OPEN instruction a bit is set in the status 
register, and a TIMEOUT trap is generated if this is not cleared within the time-out 
period. The TIMEOUT trap is synchronous, and occurs during the execution of the 
CLOSE instruction. 
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4.3 Local Instructions 

The local instructions are optimised for communications with additions for the local 
event control, locked memory operations, scheduling, and DMAs. They execute on 
a windowed register model. 

The local instructions are executed on a windowed register model. Only registers 
i7-i0, o7-o0 and gO and the ice are implemented. 

Upon de-scheduling a process, registers i0-i7, o0-o5, o7, and Iptr are saved. The 
integer condition codes are saved in the status register until another thread process 
begins to execute. The context and SP are implied in a process. The stack frame 
being stored relative to SP. 

The SP can only point to 32 byte boundaries so that the SP can only be adjusted in 8 
word increments. This enables the de-scheduling, SAVE and RESTORE operations 
to be done with block read/writes. 

4.4 Scheduling 

The thread processor has a simple run queue scheduling model. Processes which are 
run are placed at the back of the run queue. Processes are stored with the context in 
which they are executing. Processes are taken off the run queue and run until they de* 
schedule themselves or are forced to by a trap. Four instructions are used to control 
scheduling. The run queue is held as memory based FIFO. 

A process is specified by its Stack Pointer (SP or 06) and context A de-scheduled 
process has all its state stored in its stack frame. After quoting a processes wptr to 
another process, the other process may try and run it while it is still executing. The 
SP should therefore never be changed and the process should SUSPEND as soon as 
possible to avoid confusion. 
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The DMA processor can be used for local (store-to-store) transfers, or for network 
(store-to-remote-store) transfers. For network transfers, the DMA processor 
constructs packets with arbitrary numbers of read/write block transactions. Unlike 
conventional DMAs, the amount of data transferred by a transaction, and the number 
of transactions within the packet, are variable and dependent on network loading. At 
times of high load the packet size may be reduced (between 4 to 16 transactions), 
and the transfer may even be suspended. The maximum amount of data transferred 
within a single transaction is 32 bytes. 

5.1 DMA Descriptor 

To request a DMA the main processor constructs a DMA descriptor and writes the 
address of this descriptor to the ELAN command port The ELAN's command 
processor extracts these descriptors from the port and adds them onto the DMA queue. 

The DMA processor extracts descriptors from the queue when it is ready to process 
them. The DMA processor constructs the necessary packets and encapsulates the 
data within the transactions. 

5.2 Typing of DMAs 

The DMA engine can handle typed data. The default DMA type is byte. If the store 
ordering of both sender and receiver is the same then typing does not matter. If the 
store ordering is different then typed transfers ensure that the correct movement of 
data occurs. Data must be aligned to the required type size. Local byte block moves 
can be used to move data to any alignment. 
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5.3 DMA Failures 

Within a large network of ELAN processors the majority of traffic being transferred 
will be in the form of DMAs. Errors will occur on the links connecting these 
processors, and are detected by the network. Bit errors on data bytes will cause a 
CRC failure and the packet will be NACKed. 

To provide some insulation from these errors the DMA descriptor contains a fail 
count This is the number of non-corrupting failures that can be tolerated in a 
particular DMA. If a failure occurs and the fail count is non zero the DMA will be 
re-queued such that the failing part of the DMA is re-attempted, and the fail count is 
decremented. 

If the DMA failure count is zero when a failure occurs then the 
DMA_FAILED__COUNT_ERROR exception occurs. 

5.4 Exceptions 

Exceptions that can be generated by the DMA processor are: 

DATA_ACCESS_EXCEPTION 

DNIMPLEMENTED 

OUTPUT_INVALID_PROCES S 

OUTPUT_INVALID_ROUTE 

OUTPUT_ALREADY_OPEN 

EVENT_INTERRUPT 

QUEUE_OVERFLOW 

DMA specific exceptions: 

DMA_FAILED_COUNT_ERROR 

5.5 Normal and Secure Transfers 

DMAs are denned with two modes of operation: normal and secure. In normal mode 
the objective is to move data at as high a bandwidth as possible. In secure mode the 
correct writing of data is assured. 

normal and secure transfers are really only different for remote DMAs. 
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Normal DMA opens a packet and outputs a DMA transaction with the SENDACK 
bit set. It continues outputing DMA transactions without the SENDACK bit set until 
either 

a) the ACK or NACK is received. 

b) 16 transactions have been sent 

c) the DMA time slice period is exceeded. 

When any of the above are satisfied, the DMA processor prepares the 
END_OF__PACKET to be sent If the DMA time slice period has been exceeded 
or a NACK has been received the DMA descriptor is updated and placed on the back 
of the queue. Otherwise the process is repeated. 

Remote DMA opens a packet and outputs 16 transactions provided that DMA time 
slice period isn't exceeded. These are sent without SENDACK set, and an additional 
NULL transaction is sent at the end with SENDACK set This ensures that errors 
occurring at the input are observed by the DMA processor. 



5.6 DMA Timeslice Period 



The DMA timeslice period is 50 micro seconds. This value was decided upon such 
that if only the DMA process is running and the network isn't congested then the 
overhead of time slicing would be less than 5%. For Elan 1.0 this is approximately 
50 micro seconds or the time to send 2K bytes. 
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It is a desirable requirement that input transactions are dealt with as quickly possible. 
To help achieve this some decoupling is required between the inputter and outputer 
for transactions that require a reply, such as tr_readword. This decoupling is 
achieved by using the reply processor. 

The reply processor, like the DMA processor, removes work from a queue of tasks, 
each task being defined by a descriptor. Because replies are queued until the outputter 
is free to send, a reply generating input need only arbitrate for the memory system, 
and not the outputter as well. 

Replies are formulated at the input and put on a reply queue. The reply processor 
is responsible for taking replies off the queue and turning them into packets and 
inserting them onto the network. A reply takes the form of a number of read results, 
up to three, and a tr_remotereply. Each read specifies a return write address, 
and the tr__remotereply specifies the processor and context in which those writes 
will occur and an event to be set. 

6.1 Reply Descriptor 

The reply queue has a fixed entry size of 8 words, or 32 bytes, and takes the form: 



word no 


Contents 


Description 





Process, Context 


Destination process 


1 


Address 


Event to set after writes 


2 


Data 


Data write for address 1 


3 


Address 


address 1 Word 


4 


Data 


Data write for address 2 


5 


Address 


address 2 Word 


6 


Data 


Data write for address 3 


7 


Address 


address 3 Word 
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The reply processor will produce a packet with three tr_wr it e words and a 
tr — set event. If any of the write addresses are NULL then the reply processor 
ends the packet and sends the tr_set event. The address of the event may also be 
NULL, and this will be interpreted by the inputter as a NULL operation. 

The descriptor gives the context on which the reply is to be formulated, and the 
process number to which it is to be output The context and process number need to 
be translated by the reply processor; note that this is done at the same time as the reply 
is output Placing the translated values of hardware context and processor number in 
the queues would not allow a context to be disabled rapidly at the translation tables. 
The context is held in the low half of the word, the destination processor number is 
held in the upper half. 

6.2 Exceptions 

The reply processor may generate any of the following exceptions: 

OUTPUT_INVALID__ROUTE 

OUTPUT_INVALID_PROCESS 

OUTPUTJTIMEOUT 

Reply processor specific exceptions: 

RPROC_NACKED 

A RPROC_NACKED trap is generated if a reply packet generates a Nack. 



meko S1002-10M102.04 20 



COMMAND PROCESSOR 



7.1 Command Processor Specification 

The command processors purpose is to provide a checked calling interface between a 
UNIX system and the ELAN processor. This allows ELAN processes to be initiated 
using an area of paged memory as the protection mechanism. In this area different 
pages initiate different commands, access to these pages can be controlled by the 
kemal executing on the UNIX system. 

The command processor serves an additional purpose in the initial implementation, 
turning physical interrupts, into EVENT signals. 

The command processor is activated by either a physical interrupt becoming ready 
or a command port slot being set full. The first thing the command processor does is 
determine which type of event caused it to wake up. 

7.2 Command Source 

Do the command in the command port and when done place the parameters in the 
data register and clear the full bit. 

Commands available on the comms processor are detailed in appendix E "Command 
Processor Instructions". 

7.3 Registers 

The following internal and external registers are used to control the command 
processor: 
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Command Context Table 

For commands requiring a context, a field of 2 bits in the command value are used to 
determine which context from a vector of 16 bit contexts is to be used. The value is 
looked up by the command processor. The encoding of the context selection in the 
command value allows contexts to be guarded using page table entries. 

The context translation vector consists of a single double length register, which is 
sufficient to store 4 contexts. (More can be supported) 

Interrupt Base Register 

This points to the vector of event locations that are to be used for internal interrupts. 
The events are double word locations so for example interrupt 5 has a vector entry of 
this base register plus ten. 

CommsProcIntMaskReg 

This is an interrupt mask register, and is ANDed with the Interrupt register to 
determine whether an interrupt is to be taken by the ELAN. This register is used 
in conjunction with the main processor interrupt mask register in the slave memory 
map to control the taking of interrupts. 



7.4 Command Port 



The command port is a memory mapped object in the Elan processors slave address 
space through which the main processor can queue up commands with a single 
read/write word swap operation. 

A basic command register consists of a 32 bit data word, and a 32 bit command word. 
The command register has the following assignments: 

Bit 31 FullnotEmpty 

Bit 30 Finished 

Bit 29 Error 

Bit 28 to 15 Read as Zero 

Bit 14 to 9 Command type 

Bit 8 to Immediate 

A command port may have multiple command-data register pairs. 

Commands may be issued by writing to the register pair and ensuring that the full bit is 
set. The command processor signals completion by clearing the full bit. Commands 
may return values in the 32 bit data register. Scheduling a command using this 
mechanism would require polling the command register to check that it is empty, 
then writing to the command data pair, requiring multiple read and write operations. 
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In order to minimise the overhead of issuing a command, a second image of the 
command register is provided. This memory image is guarded so that a single word 
swap operation is sufficient to issue a command. 

Within the command issue image, reading and writing have the following effect. Any 
read will return the number of the next command slot that is free. If all the command 
slots are full a negative number is returned. If a command slot is free then a word 
write causes data to be written to the next free data slot, and the command to be set 
to full with the lower 15 bits being taken from bits 17 to 3 of the address. A write 
when there are no free slots still acknowledges to the memory interface, however no 
state is modified. 

Note that although this is a special use of read and write, data within the command 
processor is still only changed on write memory cycles. 

From the user's perspective to issue a command G with immediate address I and data 
D, he performs a RmW operation to a word vector indexed by C,I. This returns a 
small positive integer (the data register which any read data will be returned in) if 
successful, or a negative value if not. Note that for correct operation when using 
RmWs the value of Full must be sampled at the start of the transaction so that the 
same value is used for both the read and the write. 

Using the command issue image it is possible for the user image to be consistent 
over different implementations with different depths of command queue. So that 
user code is truly command queue length independent, only one outstanding read 
operation should ever be permitted. 

Commands are executed in a round robin order of command registers, ensuring that 
commands are executed in the order they are submitted. 

7.4.1 Command Port Address Map 

The address map of a command port is chosen to map onto a 4k byte paged MMU. A 
command port physical address is aligned to a 256KByte boundary. The 4KByte page 
of this is used for direct access to the command port registers, which are allocated 
in pairs, starting from the first port address. Each successive physical page after this 
maps a different command. (Note that this means that command is invalid as it 
cannot be issued). In particular a command port that uses only one command can be 
mapped in two pages, or the entire command set can be mapped in a single 256KByte 
transaction. Note that mapping of individual pages permits individual commands 
to be mapped in and out of user space. More detail on the ELAN memory map is 
provided in Appendix J "Communications Processor Memory Map". 
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FFiF80000 data reg 

command 

data reg 1 

command 1 

Multiple images of command slots to page boundaries. 

FFiF81000 command 1 zero immediate 
FFiF8100C command 1 immediate 3 



FFiF83C00 command 3 immediate 3x300 

i defined by MBus ID pins 

Commands are denned in Appendix E, "Command Processor Instructions". 

7.5 Interrupt Source 

Upon an internal interrupt becoming ready, ie the Interrupt Register ANDed with 
the CommsProcIntMaskReg being The action on an interrupt is to indirect into the 
interrupt event vector and cause a set event on that location (These are always done 
in contextO). The communications processor interrupt mask is then updated to mask 
out the interrupt source and then the command processor deactivates. 

7.6 Exceptions 

Exceptions that can be generated by an command processor are: 

DATA_ACCESS_EXCEPTION 
EVENT_INTERRUPT 
QUEUE_OVERFLOW 
MEM_ADDRES S_NOT_AL IGNED 

Unimplemented commands are not trapped since they can only be caused by wrongly 
mapped MMU pages, and as such are the responsibility of the system code to prevent 
them occurring. 
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8.1 Virtual Events 



The virtual event mechanism provides a generalised implementation of interrupt style 
signalling across a distributed memory machine. The mechanism is asymmetric, 
unlike the OCCAM channel, in that only one side commits. (The OCCAM channel 
style synchronisation primitive can however be built from a pair of events.) An event 
location is set by a thread that wishes to signal, and waited on either by suspending 
or polling by the thread waiting for that signal. The event mechanism may be 
protected so that it acts between processes as a distributed implementation of a trap 
mechanism, (with the important difference that the process signalling the trap does 
not automatically suspend itself). 

The operation of remote events depends on which thread has the event in its address 
space. Generally speaking the most efficient implementation will be one in which 
the thread that is to wait has the event location in its memory space. In this case the 
thread suspends itself locally and is woken up by a setevent network transaction. 

It is also desirable to be able to wait on an event location in a remote process 's memory 
space. For reasons of security the actual waiting thread address cannot be exported 
to another process. Instead the thread exports the address of an event location local 
to its own memory, and then suspends itself on that local event. 

A process from the main processor may also wait on an event location. In this case 
the value written to the wait location is distinguished by the number in wordO. The 
interrupt is not signaled immediately, but is put on the run queue and flagged when it 
reaches the top of the queue. 
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8.1.1 Event Locations 



The event location consists of two 32 bit words. This pair of words must be double 
word aligned. The two words can have the following values: 



WoidO 


Wordl 


LSBS 
(wordl) 


X 





00 


X 


1 


01 


proc 


DWaddr 


00 


count 


queue 


10 





queue 


11 


-1 


Iaddr 


00 


-2 


Waddr 


00 


-3 


Daddr 


00 


-4 


Dummy 


00 


<-4 


Eaddr 


00 



meaning 

clear event location 

ready event location 

remote thread waiting (DWaddr = remote event address) 

clear queued event location (queue may be full) 

ready queued event (queue is empty) 

main processor waiting for interrupt 

local thread waiting 

DMA desc waiting 

Null event waiting 

Local event waiting 



Note that the two LSBs of the waiting address always have the following meaning: 

Bit The event is ready ( 1 ) or not (0). 

Bit 1 A queue of waiting processes begins at the location pointed to by waiting 
address. The number of entries on this queue is held in the proc number 
location. If this is zero the fptr must be zeroed by the next process to wait 
on the queue. 

A waiting object is now indicated by bits [31:2] of the second event word being non- 
zero. The first word is not effected by set event, unless the event was a queue and 
contained a waiting item. 

typedef struct EVENT 

{ 

union { 

int32 count; 

proc_t procid; 
} wordO ; 
union { 

thread_t *thread; 

queue_t * queue; 

int32 value; 
} wordl; 
} event t; 
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#def ine event__count wordO . count 
#define event_procid wordO.procid 
#define event_thread wordl. thread 
#define event_queue wordl. queue 
#define event_yalue wordl. value 

#define READY (ev) (ev-> event_value & 1) 
#define HASQ (ev) (ev-> event_value & 2) 

typedef struct eventQ 

{ 

int32 eq_size; 

int 3 2 eq_f r ont ; 

EVENT eq_Q[]; 
} 

8.1.2 Queued Event Locations 

So that multiple threads can wait for an event, event queues are provided. An event 
queue address must be double word aligned. The structure of an event queue is, 

int size; // number of queue slots 

int fptr; // index into queue pointing 

to front . 

struct qentry queue [size] ; // an array of queue 

entries. 

The number of items in the queue is the value in wO of the event location. The value 
of index is always reset to zero when the queue is empty. 

8.1.2.1 Queued Event Location Entry 

A queue entry consists of a double word. The double word can contain either a local 
thread address, a local interrupt address, or a remote set address. A null entry on the 
queue, of zero, zero, is also allowed. This allows items to be simply deleted from 
queues, but note that it does not immediately free up queue space, also that it will 
cause the queue to appear to have threads waiting on it, and a set event ignores a null 
entry, ie it will try to move onto the next entry in the queue if one exists. In this case 
if no entry exists the event location will be left set. 

A queue entry consists of a double word and is similar to an event location in structure. 
The two words can have the following values: 

WordO Wordl meaning 
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proc 


DWaddr 


-1 


Iaddr 


-2 


Waddr 


-3 


Daddr 


-4 


X(<>0) 


<-4 


Eaddr 



remote thread waiting (DWaddr = remote event address) 

main processor waiting for interrupt 

local thread waiting 

DMA desc waiting 

Null event waiting 

Local event waiting 



8.1.3 Operations on an Event Location 



The following operations can occur on an event location. 

Local wait. 
Local set 
Remote wait. 
Local clear 
Local/Remote Test. 



8.1.3.1 Local Wait 

A local wait is executed by the thread processor. This tests the value of the event 
location for readiness. If ready the event is cleared and execution continues. If not 
ready the thread address is either stored in the event, or if queued in the attached 
queue location. If either a single event already has something waiting, or a queue 
exists but is full, a queue overflow exception is generated. 

8.1.3.2 Set Operations 

Local set is executed by the thread processor. This tests to see whether the event has 
anything waiting on it. If not the the ready bit is set. If something is waiting in the 
event, then the event location is reset to zero and the event done. 

The event may either be, 

remote thread Remote set event is put on the reply queue. 

local thread Thread is scheduled. 

event interrupt An interrupt is generated. 

Sets on a queued event location proceed in a similar manner except that the data is 
found in the queue. Remote sets operate identically to local sets, except that access 
permissions are different ie RemoteEvent instead of LocalEvent. 
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8.1.3.3 Remote Wait 

Remote waits pass a double word value consisting of a process and an address in that 
process that fully describes another event. The process value must be non NULL. 
For a valid process value if the event is ready, it is cleared, and a remote set placed 
on the reply queue. If the event is not ready the thread is put in the event location. 

8.13.4 Local Clear 

Gear event clears the ready flag. Any queue is left unaltered. 

8.1 .3.5 Local and Remote Test 

The status of an event can be tested either locally or over the network. The status of 
any event can be checked simply from the event double-word, as a non zero value in 
wordO, means something is waiting, and a set ready bit means the queue is ready. 

8.1.4 Atomic Access Considerations 

Queue handling must be atomic. Communication processors ensure atomic access 
by locking the memory bus for the whole of the access sequence. Programs running 
on the main processor should lock out the communications processor from memory 
access by using the memory access inhibit mechanism in the communications 
processor. 

8.1.5 Protection 

Event locations have a special level of memory protection. Local threads have 
complete read write access to events mapped into their memory space. Remote 
threads however are only allowed to do set operations, and remote waits, and tests in 
store areas with remote event only permission. This ensures that thread creation is 
not possible by outside threads, and that the only threads running in a context will be 
those created locally. (Remote running of threads is of course possible in an area of 
store with remote write and event permission, by remote writing the thread address 
to some location and immediately setting it). The queue areas should be in areas of 
store with local R/W permission only. The inputter needs to access these with the 
required access types for queue handling operations. 
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8.2 Exceptions 

Exceptions are indicated by a non zero value in the trap type field of the status 
register of the processor with the exception. This generates a maskable interrupt. 
The exception is not set until the processor has put itself into a known state in the 
exception area, the processor then stops executing. The wakeup function in the status 
register is set to WakeupNever. 

The main processor handles exceptions through the command port. The exception 
handler extracts the queue item which caused the exception from the internal registers 
of the ELAN processor. The main processor may then restart the unit, by updating 
the wakeup address then the wakeup function in the status register. 

The process exception area the following layout for each processor. 

STATUS Status register at point where exception was noted. The Memory error 
bit may be set, in which case the MMU error locations are valid. These 
are in the next two locations. 

FSR MMU fault status register. 

FADDR Fault address. If the exception was the result of an EVENT interrupt 
to the main processor this is the address value from the EVENT. 

CONTEXT Value of CurrContext at time the exception was detected This may not 
be the value of the context for the process for which the error occurred 
in the case of MBus errors, ie BUS ERROR, TIME OUT ERROR and 
UNCORRECTABLE ERROR. 

8.2.1 Exception Sources 

NO_TRAP 

Reset state of trap type. No trap has occurred 

D AT A_ACCES S_EXCEP T ION 

Error detected by MMU. In this case FSR and FADDR are valid. Meanings of 
FSR and FADDR are defined in appendix H "MMU User Guide". 

MEM_ADDRESS_NOT_ALIGNED 

Virtual address was mis-aligned, (eg non zero bottom two bits for word access) 
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OUTPUT_INVALID_PROCESS 

(outputting processes only) 

Attempt to output to a virtual process number not currently mapped in the 

virtual process table for this context. 

OUTPUT JTIMEOUT 

(outputting processes only) 

Output device timed out. The outputing process had opened the packet for 

greater than 4 milli-seconds before checking the packet acknowledge. 

EVENT_QUEUE_OVERFLOW 

(only on threads processor, or input processor) 

Attempted to wait on a queued event with insufficient space left in the queue 

structure. 

QUEUE_OVERFLOW 

Processor attempted to queue a descriptor on reply, thread or DMA queue 
where there is insufficient space left in the queue structure. The descriptor 
queue structures for these processors have 16 locations dedicated to contextO 
processes. 

Any unit which performs set events can cause overflows on either the thread 
queue or the reply queue; whether it is the thread queue or the reply queue 
depends on whether the object in the event is local, or remote. All units apart 
from the reply processor perform set events. 

Various explicit thread operations can cause overflows. A DMA instruction can 
cause a DMA queue overflow, and a RUN instruction can cause a thread queue 
overflow. 

The command processor has a command to enqueue an item onto any of the 
three queues, each of which may overflow. 

The input processor can cause DMA queue overflows through remote DMA 
operations, and reply queue overflows by remote read operations. 
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UNIMPLEMENTED 

Unimplemented is a valid trap for all the processors except the reply processor. 
The command processor unimplemented is not currently checked, as this can 
be guarded by the main processors MMU. 

PACKET_SEQUENCE_ERROR 

This is generated by any thread instruction on the output device which occurs 
in the wrong order, (ie, SENDJTRANS, without OPEN, or multiple OPENs). 

DPROC_FAIL_COUNT_ERROR 

DMA processor decrements its fail count every time it receives a Nack. When 
this value is zero an exception is generated. 

RPROC_NACKED 

The reply processor has no fail count so generates an exception immediately if 
it is Nacked. 

IPROC_ERROR 

Caused by CRCerror, BadEOP, BadLength. Cause of error detectable from 
status register. 

8.2.2 Exceptions During Output 

An exception can occur on either the thread, reply, or DMA processor while it has the 
output device open. The microcode which generates an exception will automatically 
close down the devices and sends an EOP error down the line. The status register 
indicates whether or not a processor had the output open at the time of an exception. 

8.2.3 Exceptions During Input 

If an input exception causes a trap on one of the two inputters it may be necessary for 
the trap code to read the received transaction from the comms processor's internal 
memory. Each input process has a cyclic input buffer of 16 words. The internal ram 
locations 'TEMPJNPUTO* and 'TEMPJNPUTl' contain the internal ram pointers 
to the beginning of the transaction in this buffer dependent on the inputter number. 
They are held in an internal format that is decoded by ANDing the read value with 
Oxf and using this as an index into the respective input buffer. 
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ie If TEMPJNPUT1 = Oxce, then the index is Oxe. The transaction code will be 
found at address Oxde and the address at Oxdf. Data words for the transaction would 
be found at OxdO, dependent on the size bits set in the transaction word. 

The external location 'NextTransFront' contains the pointer to the place where the 
next transaction will be placed in the buffer. This may be used to delimit the 
transaction but using the size from the transaction itself will probably be quickest 
since this will require less slave reads. 

More complex exceptions should be dealt with by turning on the context filter for 
the exceptional context. The remaining transactions of the current packet should be 
extracted in turn until the EndOfPacket. If the packet isn't going to be dealt with 
correctly it should additionally be NACKed if possible. Subsequent transactions can 
be extracted in a timely fashion by asserting the trap on transaction function. 

The following microword restart addresses can be used: 

NAckAndMoveToNextTransaction 

Will NAck the packet and move to the next transaction. It increments the buffer 
pointers and tells the input queue logic that the transaction has been consumed. 
If the Ack has already been sent then a IPROC_NACK_AFTER_ACK 
exception will occur. 

MoveToNextTr ansact ion 

Will move to the next transaction in the buffer or wait for one to arrive. It 
increments the buffer pointers and tells the input queue logic that the transaction 
has been consumed. 

HandleTransaction 

Restarts the inputter at the current transaction. If the transaction hasn't been 
ACKed or NACKed the input can be restarted at the 'HandleTransaction' 
microword. The reason for the exception will need to have been cleared. 
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8.2.4 Exceptions on Command Processor 

Exceptions on the command processor are implemented in a slightly different way 
to the other processors because the command processor is used to access the internal 
state of the ELAN processors. The state of the command processor is held within 
the command ports. These are first extracted (and cleared) and exceptions are 
disabled from the command processor using the CProcReset bit in the ELAN 
command register. Normal command port read and write access to the internal 
registers can be used to clear the command processes status register. The CProcReset 
bit is then disabled. After the reason for the exception has been cleared (i.e 
DATA__ACCESS_EXCEPTION on unmapped descriptor for command port "thread 
processor place on queue" command) the commands can be placed back onto the 
command port, and execution recommence. Note that an internal interrupt from the 
command port can not be handled by the command processor. 

8.2.5 Exception on Reply Processor 

If an exception occurs on the reply processor the trap-handler may need to get the 
descriptor to help diagnosis, or to restart the reply. The descriptor is held in the 
internal registers REPLY_BUFFER_0 to REPLY_BUFFER_7. 

This buffer is always valid during execution of the reply processor. The state 
indicating the buffer's validity is held in the external register ReplyBufferValid. This 
needs to be cleared by the exception handler if the descriptor is to be removed. 

The reply processors status register SuspendAddr has to be reset to the default reset 
value, see Appendix L. 



8.3 Interrupts 

Source of interrupts are: 



Exceptions Exceptions from each of the ELAN processors. 

External External active low interrupt lines. 

Alarm Wake up for alarm. 

Hush Hush Error. 

Halted Procs halted status. 
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Events, Exceptions, and Interrupts 



The implementation of the device is an interrupt master or source. When used as an 
Mbus slave device a single interrupt out signal is generated, and the internal interrupts 
register can be read to determine the source of the interrupt. There is an 18 bit 
processor interrupt source register. Also an 18 bit interrupts mask value. The single 
interrupt out line is the OR of the 18 processor masked interrupts. 

83.1 Communications Processor Handled Interrupts 

The comms processor uses the same interrupts but has its own separate mask register. 
The comms processor handles interrupts by turning them into context set events. 
These are located as a vector in store starting at the Intemal_Interrupt_Base location, 
an internal register which must be set up before enabling any comms processor 
interrupts. After the set event has been executed, the comms interrupt mask bit for 
that interrupt is automatically cleared. The actual interrupt handling code, (which 
would be waiting on the event location) is responsible for re-enabling the interrupt 
when appropriate. 
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MMU AND VIRTUAL PROCESS TABLE 
STRUCTURES 



A.l Root Context Table 



The root context table points to the MMU entries for each context. The internal 
register CONTEXT_PTR defines the base of the context control block tables (CCB). 
Each of these four word CCBs defines the level 1 page table entry (PTE) for the MMU 
and the virtual process table. The CCB entry is defined as, 

struct 

{ 

unsigned int context_j?age_table_entry; 

unsigned int virtual_process_table_base; 

unsigned int virtual_process_table_size; 

unsigned int reserved_space; 
} context_control_block; 

The root context table has 65536 entries. Each entry need not be filled however since 
the system code will only be able to generate and allow references to contexts. This 
restriction of reference to contexts is achieved by translating user quoted process 
numbers through a virtual process table (VPT). 



A.2 Virtual Process Table 



The virtual process table is a list of the processes with which a particular process can 
communicate. This list is indexed by the virtual process number. 

struct virtual_j?rocess_table_entry virtual_j?rocess_table[] ; 

A list entry gives the physical processornumber and context number of the destination 
process, as a single word entry, 
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struct 

{ 

unsigned context_n umber : 16; 

unsigned processor_number : 16; 
} virtual_process_table_entry ; 

The upper half of the word is the physical processor number and the lower half the 
hardware context of the process on that physical process. 

-1 is an invalid entry in the table. To reduce the store requirements of 
the VPTs, the context control block has a maximum virtual process number 
(virtual_process_table_size). Virtual process numbers are checked against this before 
indexing in the table, processes of greater or equal number are treated as invalid 
entries. Invalid entries will cause an exception to be generated. 

The virtual process table begins at the physical address given by shifting left 
virtual_process_table_base 4 bits, and zeroing the LSBs. The VPTs are therefore 
32 byte aligned. 



A.3 Route Table 

The route for a physical process is found from the route table. The comms processor 
internal register ROUTE_TABLE_PTR defines the top 32 bits of the physical address 
at which the route table begins. Each entry in the table contains four routes for each 
physical process one of which is randomly selected. The four routes can be set to 
be the same to give a single route. A route is represented by 16 bytes. The route 
table can be placed at the same location in each comms processors store to enable the 
easy removing of a particular physical processor by removing its route. Routes to a 
particular processor can then invalidated by broadcasting a write of the NULL entry 
to each of the four routes. 

struct route_table_entry *route_table_base_register [ j [4] ; 

struct 
{ 

char route [16] ; 
} route_table_entry; 

A.4 MMU Page Tables 

The MMU page tables are referenced by context through the CCB, to the context 
page table entry or level 1 entry. 
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A.5 ContextO Processes 



ContextO is a name that is given to high level processes that are used for booting, and 
error processing; these processes are analogous to the supervisor processes used in 
UNIX. ContextO processes are distinct hardware contexts that have unique privileges 
by virtue of their context number. 

Context processes run at priority over other processes on all queues. However 
lower priority queue items are not interrupted. Priority is established by front of 
queue scheduling. This means that queue overflow checking always leaves room (16 
queue entries) for some contextO processes. 

Hardware contexts 0-3 1 are the contextO process context values. ContextO, hardware 
context 0x10, processes only can wait on the internal event vector. 



A.6 Translations without MMU 



Prior to the MMU being enabled on the ELAN, virtual addresses map onto physical 
addresses without translation, the extra bits PA [35: 32] coming from the lower four 
bits of the context register. 

When the MMU is enabled virtual addresses are translated through the page tables. 

On reset the MMU is disabled. 
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INPUT PROCESSOR 



B.l Format of Input Transaction 

Transactiontype 16 bits 
Context 16 bits 

Address 32 bits 

<Data[]> 32 bytes of data for tr_blockwrite or some even number 

of words of parameters for all other transactions. 

CRC 16 bit CCITT standard CRC 

The bit fields of the transaction type are define as follows: 

<15> ackNow 

<14:12> Size 0, 1, 2, 4, 8, 16, 32, or 64 double words. Bit 14 is forced to zero 
currently so that only size 0, 1, 2, 4 are allowed. 

<11> WriteBlock 

<10:0> -WriteBlock op code 

WriteBlock <10:9> Data type 

<8:0> Part write size 



B.2 Network Transactions 

tr_writeblock 

p0-p7 32 bytes of data 



Write a 32 byte block of data to the address specified in the transaction header. If 
less than a block is to be written the part write size field of the transaction is set to 
the number of bytes to write, and the start address comes from the lower five bits 
of address. Data type is sent with the block so that if the byte order of the receivers 
memory system differs from the network byte order (big endian) byte swapping can 
be performed. 
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trJDMArequest 

p0-p7 DMA descriptor 

Put the DMA descriptor described in parameters p0-p7 onto the DMA queue. Bit 15 
of of pO is set to 1 to indicate that the DMA was created remotely. The context from 
the transaction is inserted into the context field of the DMA descriptor. 

tr_writeword 

pO new value 

Write one word into the address specified in the transaction. 

tr__readword 

pi reply address 

Read one word from the address specified in the transaction. Data value read is put 
on the reply queue along with the reply address. A tr_remotereply is then required in 
the same packet to specify the return process. 

tr_remotereply 

pO reply process 

Specifies the reply process pO, for reply data form remote word reads. The 
transaction address specifies an event address. 

Example: 

Received packet 

tr_readword (addl, , destaddl) 
tr_readword (add2, , destadd2) 
tr_remotereply (event, procB) 

Reply packet sent to "procB" 

tr_writeword (destaddl, mem[addl]) 
tr_writeword (destadd2, mem[add2]) 
tr_set event (event ) 

Note that the number of reads before a reply will be limited to the capacity of the 
reply processor (to 3 in the current implementation). 
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trreadwriteword 

pO new value 

pi reply address 

Perfonn a locked read then write operation and return the read data to the reply 
address. Destination processor must be specified by a tr_remotereply as for 
tr_readword. 

tr_atomicaddword 

pO add value 

pi reply address 

Perform an atomic add of the add value to the specified address. If the reply 
address is other than NULL, then the original value is returned as for tr_readword 
or tr.readwriteword. If there is no need to return the value, the reply address is set to 
NULL, and no tr.remotereply is necessary in the packet 

trtestandwriteword 

pO write data 
pi reply address 
p2 test value 

Perform an atomic compare of the test value with the data at the specified address. If 
the two are equal, then the write data is written, if not equal then nothing is written. 
The read value is always returned as for tr_readword or tr_readwriteword. If there is 
no need to return the value, the reply address is set to NULL, and no tr_remotereply 
is necessary in the packet 

trjsetevent 

no parameters other than address 

Cause an event on the virtual event location specified by the address. If the event is 
already set no action occurs. If there is a thread waiting this is run. 

tr_clearevent 

no parameters other than address 

Gear event specified by the transaction address. There is no handshaking on this 
operation. If die event is not ready, it has no effect 
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tr_waitevent 

pO suspending process 
pi suspending event 

If the event is not ready, it writes the address of a remote set location into the suspend 
location of the event specified by the transaction address. This may be either a queue 
or a single location. If the event is ready, the reply processor performs a set to the 
suspending event, and the event is cleared. 

tr_eventready 
trnoteventready 

no parameters other than address 

Poll an event location to see whether it is ready. The value of the condition is returned 
via the ACK/NACK mechanism described previously. Both senses are provided since 
NACK may mean that the packet has been rejected rather than the condition is false. 

tr_EQ 
tr~NEQ 
trJjTE 
trJLT 

pO testvalue. 

Perform specified comparison between the test value and the word location specified 
by the address field of the transaction. Return value via the ACK/NACK mechanism. 

B 3 Transaction Type Code Summary 

Basic memory transactions size opcode conditional 

ttiwriteblock 4 no 

tr_writeword 
tr_readword 

tr_remotereply 

trJDMArequest 

Locked memory transactions 

ttireadwriteword 

tr_atomicaddword 

tr_testandwriteword 
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1 


01000 


no 


1 


01001 


no 


1 


01010 


no 


4 


01011 


no 


1 


01100 


no 


1 


01101 


no 


2 


oino 


no 
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Input processor 



Event transactions 



tr_setevent 

tr_clearevent 

tr_waitevent 

tr_eventrcady 
tr_noteventready 

Conditional transactions 

tr_EQ 
tr_NEQ 
UJGTE 
trJLT 

Unimplemented transactions 






00000 


no 





00001 


no 


1 


00010 


no 





10000 


yes 





10001 


yes 


1 


11000 


yes 


1 


11001 


yes 


1 


11010 


yes 


1 


11011 

01111 
00011 
OOlxx 
lOOlx 
lxlxx 


yes 



Note: Set Event NULL is NULL transaction; the value of the transaction is all zeros. 



B.4 Input Reply Buffer 



On receiving a network read command the input checks that it has access to a reply 
buffer. This is the place where it will store the reply data until it inputs the reply 
command. If one is not available the input will attempt to send a N ACK, (and trap if 
an ACK has been sent). The input will read the READ data and place it in the buffer. 
Also in the buffer it places the address, from the network instruction, where to store 
the data. It then moves onto the next transaction. 

On receipt of the first READ in a group the input will take ownership of a buffer 
(On the initial ELAN there is one buffer shared by two inputters). It will retain this 
ownership until the EOP is received or a NACK is forced. 
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B.5 Input Context Filtering 

The input context filter is an architectural feature which enables a number of contexts 
to be ignored by the inputer. Any such match on the input causes a NAck to be sent 
immediately without touching the store system. Han 1.0 implementation has only 
one such context filter available as an external memory location. The filter is enabled 
with the MMU and so should be set to a non-existent context when not required, for 
example OxfHf . 

The context filtering mechanism might be used by an operating system to reduce the 
activity of a context on a node during resolution of an exception. 

The context filter can be read and written as an external register, InputContextFilter. 
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C.1 Run Queue Entry 

The processors run queue is a front and back pointer queue. Each entry is 32 bytes. 
The entry together with the de-scheduled register window stored in the stack frame 
enables any of the processor state to be reloaded. 

STRUCT run__queue__entry 

{ 

unsigned int context, address; 

unsigned int nPCjlnc, cc; 

unsigned int reservedO, reservedl; 

unsigned int reserved2, reserved3; 
} 

On running a process off the queue the nPC.inc is added to the PC + 4 gained from 
the memory image of the suspended process. The nPC_inc and cc are set to zero if 
the process had suspended by waiting, or the suspend instruction. However on an 
exception restart the queue entry can be altered to reflect any instruction having been 
executed or about to be executed dependent on the way an exception was handled. 

C.2 Software 

C.2.1 Thread Processor User Stack Frame 

Upon de-scheduling the processor state must be saved. This is stored relative to the 
SP. The SP is quoted as being the Wptr. 
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FP, (old SP) -> 



(Wptr) SP -> 



SP 


- 32 


+ 


28 


SP 


- 32 


+ 


24 


SP 


- 32 


+ 


20 


SP 


- 32 


+ 


16 


SP 


- 32 


+ 


12 


SP 


- 32 


+ 


8 


SP 


- 32 


+ 


4 


SP 


- 32 







C.3 



Program Stack 



I 
I 

117 
116 
115 
114 
113 
112 
111 
110 

I blank 
I blank 
I blank 
I blank 
I blank 
I blank 
I blank 
I blank 



Stack frame 



(Return iptr) 
(FP) 



IFP 
I iptr 
|o5 
|o4 

|o3 
|o2 
lol 
Io0 



(06 or SP not saved) 



Previous frame 



Current frame 



Instruction Set 

C3.1 Local Instructions 



The local instructions are optimised for communications with additions for the local 
event control, locked memory operations, scheduling and DMAs. 



General Instructions 


LD 


ST 


ADD (ADDcc) 


SUB (SUBcc) 


AND (ANDcc) 


ANDN (ANDNcc) 


OR (ORCC) 


ORN (ORNcc) 


XOR (XORcc) 


XNOR (XNORCC) 


SL 


SR 


SETHI 




SAVE 


RESTORE 


Bice 





SRA 



meko 
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CALL JMPL 

Scheduling Instructions 

SUSPEND BREAK RUN 

Local Event Control 

WAITEVENT SETEVENT 

Locked Memory Instructions 

SWAP ATOMICADD TESTSTORE 

On the Elan revisions A, B and C only the SWAP instruction is implemented. 

DMA Control 

DMA 

C32 Network Instructions 



Packet generation instructions. 

OPEN CLOSE SENDTRANS 

C3.3 Instruction Definitions 



General Instructions 

ADD 
ADDcc 
SUB 
SUBcc 



These instructions implement arithmetic operations. ADDcc and SUBcc 
modify all the condition codes. 



Traps: 

(None) 



AND 

ANDcc 

ANDN 

ANDNcc 

OR 

ORcc 
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ORN 

ORNcc 

XOR 

XORcc 

XNOR 

XNORcc 



These instructions implement bitwise logical operations. ANDcc, ANDNcc, 
ORcc, ORNcc, XORcc and XNORcc modify all the condition codes. 

Traps: 

(None) 



SLL 
SRL 
SRA 



These instructions implement logical shift left SLL, logical shift right SRL 
and arithmetic shift right SRA. The shift count is the five least significant bits 
of r [rs2 ] if the / field is zero or simml3 if the i field is one. 

These instructions do not modify the condition codes. 

Traps: 

(None) 



SAVE 



The SAVE instruction saves the current window frame into the stack. 
Otherwise the SAVE instruction acts like an ADD that always writes its result 
to SP. The amount by which the SP is adjusted by should always be a multiple 
of 32, since the window frames are stored on 32 byte boundaries. 

The out registers of the current frame become the in registers of the new 
frame. 

Traps: 

DATA ACCESS ERROR, MEM ADDRESS NOT ALIGNED 
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RESTORE 



LD 



ST 



The RESTORE instruction resorts to the previous window frame on the stack. 

Otherwise the RESTORE instruction acts like an ADD that always writes its 
result to SP. The amount by which the SP is adjusted by should always be a 
multiple of 32, since the window frames are stored on 32 byte boundaries. 

The in registers of the current frame become the out registers of the new 
frame. 

Traps: 

DATA ACCESS ERROR, MEM ADDRESS NOT ALIGNED 



This instructions loads a word from memory into the r register denned by the 
rd field. If the load traps the register is unchanged. The effective address for 
theloadiseither n r[rsl] + r[rs2]" ifthei field is zero or "r [rsl] 
+ simml3 n if the i is one. 

Traps: 

DATA ACCESS ERROR, MEM ADDRESS NOT ALIGNED 



This instructions stores a word from the r register defined by the rd field 
memory into . If the load traps the register is unchanged. The effective 
address for the load is either n r [rsl] + r [rs2] " if the i field is zero 
or "r [rsl] + simml3" ifthei isone. 

Traps: 

DATA ACCESS ERROR, MEM ADDRESS NOT ALIGNED 
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Scheduling Instructions 

SUSPEND 



SUSPEND de-schedules the cuirent process. The value of the iptr is placed 
in 06, which normally contains the SP. The ins are written into the stack 
frame and the augmented outs to the eight words below the stack frame 

Traps: 

RUN_QUEUE_OVERFLOW, DATA_ACCESSJERROR 
BREAK 

BREAK places the process (SP) on the back of the run queue, and SUPENDs 
it A break in execution is guaranteed, but note that contextO processes will 

not BREAK, 

Traps: 

RUN_QUEUE_OVERFLOW, DATA_ACCES S JERROR 
RUN (rs2) 



RUN takes the value in rs2 and assuming it to be a valid thread for the current 
context places it on the rear of the run queue. 



Traps: 



RUN_QUEUE__OVERFLOW,MEM_ADDRESS_NOT_ALIGNED, 
DATA ACCESS ERROR. 



Local Event Control 

Local events are executed by the output process. These instructions are atomic to 
operations occurring at the input 

WAITEVENT (rs2) 



Test the event location pointed to by rs2. If the event is set it is cleared and 
the process continues. If the event is not set then the thread suspends itself 
on the event Multiple threads can be suspended on queueing events. 
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Traps: 



EVENT_QUEUE__OVERFLOW,MEM__ADDRESSJNOT_ALIGNED, 
DATA ACCESS ERROR, DATA ACCESS EXCEPTION. 



SETEVENT (rs2) 



Set the event pointed to by rs2. If a local thread is waiting then the event is 
cleared and the thread run. If the suspend location contains a remote event 
location, then a remote set is queued on the reply processor. 



Traps: 



RUN_QUEUE__OVERFLOW, MEM__ADDRES S_NOT__ALIGNED, 
DATA ACCESS ERROR, DATA ACCESS EXCEPTION. 



Locked Memory Instructions 
NOT YET IMPLEMENTED 

ATOMICADD (rsl, rs2, rd) 



ATOMIC_ADD does an atomic read add write operation with the value at 
address rsl and the value in rs2. The read value is placed in rd. 



Traps: 



MEM_ADDRESS_NOT_ALIGNED, DATA_ACCESS__ERROR, 
DATA ACCESS EXCEPTION. 



TESTSTORE (rsl, rs2, rd) 



TESTSTORE does an atomic read test write operation with the value at 
address rsl, the test value in rs2 and the new value in rd. The value of the 
read is placed in rd. If the test value is equivalent to the value of the read then 
it is replaced by writing the new value else the location is unchanged. 



Traps: 



MEM_ADDRESS_NOT_ALIGNED, 

DATA ACCESS ERROR, DATA ACCESS EXCEPTION. 
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Network Instructions 

All network transactions are sent in the currently open packet If a packet is not open 
a packet sequence trap will occur. 

OPEN (rs2) 



Opens a packet to be transmitted to the virtual process number indicated by 
rs2. A translation of the virtual process number to physical processor and 
context is first executed. This is done by indexing into the virtual process 
table and returning the destination physical processor number and context 
The context is held in a temporary register during the outputing of the packet 
and added to each transaction. The processor number is used to index into a 
table of route data. These are fetched and placed at the outputer. 



Traps: 



PACKET_OPEN_FAILED, DATA_ACCESS_ERROR, 
PACKET SEQUENCE ERROR. 



Implementation Note: route bytes maybe cached on a last used basis. 

CLOSE <rd) 

CLOSE is matched to open and delimits a remote packet rd is set to 1 
or according to the value of the acknowledge returned by the packet 
CLOSE waits (without de-scheduling) for the receipt of the acknowledge and 
authorises sending of the EndOfPacket signal. CLOSE returns with 1 in rd if 
an ACK was received and otherwise. 

The value of a conditional transaction may be reflected in the value set by a 
CLOSE. 

Traps: 

PACKET_SEQUENCE_ERROR 
SENDTRANS (imm, rs2, rd) 

Send transaction instruction, used to launch transactions into the network. 
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Generate the transaction specified by vmm to the context currently opened, 
with the address in rd. Cause a trap if not currently open. If rs2 is not %g0 
the parameters for the transaction come from a block of memory pointed to 
by rs2 else come from registers %o0 to %ol as required. Only the number 
specified in the SIZE field of the transaction get used, so if the SIZE is zero, 
both %o0 and %ol are ignored. For transactions which need more than 
two word parameters ie those with a SIZE field larger than or 1 (double 
words), the block version must be used. In practice this is only required for 
the trJDMArequest and trjestandwrite. 

The imm field is a 16 bit field, bits [20:5] and as such is fixed at compile time. 

Traps: 

PACKET_SEQUENCE_ERROR, MEM_ADDRESS_NOT_ALIGNED, 
DATA_ACCESS_jERROR,DATA_ACCESS_EXCEPTION. 

DMA Control 

DMAs are performed by a separate engine with its own work queue. The DMA 
instruction places DMA descriptors on this queue. The DMA engine and the thread 
processor are of equal priority so that where both have work to do both make progress. 
DMA processes are timesliced to prevent long DMAs from hogging resource. 
The timeslice interval is chosen so that the overhead on memory bandwidth of 
rescheduling the DMA descriptor is small compared to the amount of data transferred 
in the timeslice period. Synchronisation with the completion of a DMA is achieved 
by passing the descriptor of an event location to the DMA engine, then executing a 
WAIT_E VENT. The DMA engine on completion will cause a SETJEVENT. 

DMA (rs2) 



DMA places a store to store DMA descriptor on the DMA queue. The 
descriptor is pointed to by rs2 and is 32 byte aligned. 

Traps: 

DMA QUEUE OVERFLOW, DATA ACCESS ERROR 
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C.4 Opcodes 

C.4.1 Format 1 Opcodes 



op 

01 CALL 

C.4.2 Format 2 Opcodes (op « 00) 



op2 

\begintable { } 
opcode & usage & op2 

SENDTRANS & rs2 rd & 001 
Bice & none & 010 
SETHI & rd & 100 
\endtable 

\sectionthree{ Format 3 Opcodes (op = 10) } 

\beginprogram{ } 
op3 

000000 ADD 

000001 AND 

000010 OR 

000011 XOR 

000100 SUB 

000101 ANDN 

000110 ORN 

000111 XNOR 

010000 ADDcc 

010001 ANDcc 

010010 ORcc 

010011 XORcc 

010100 SUBcc 

010101 ANDNcc 

010110 ORNcc 

010111 XNORcc 
100101 SLL 

100110 SRL 

100111 SRA 
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110110 


CPopl 






111000 


JMPL 








111100 


SAVE 








111101 


RESTORE 






C.43 


Format 3 Opcodes (op = 


11) 


op3 










000000 


LD 








000100 


ST 








001111 


SWAP 








C.4.4 


CPopl Opcodes (op = 10, 


op3 = 110110) 


opcode 




usage 


ope 


Hex Opcode 


BREAK 




none 


000000000 


Ox81bOOOOO 


SUSPEND 


none 


000000001 


0x81b00020 


CLOSE 




id 


000001000 


0x81b00100 


RUN 




is2 


000010000 


0x81b00200 


OPEN 




is2 


000010001 


0x81b00220 


SETEVENT 


rs2 


000010010 


0x81b00240 


WATTEVENT 


rs2 


000010011 


0x81b00260 


DMA 




rs2 


000010100 


0x81b00280 


ATOMICADD 


isl rs2 rd 


000011000 


0x81b00300 (Not implemented) 


TESTSTORE 


rsl rs2 rd 


000011001 


0x81b00320 (Not implemented) 



Bits [3:4] of ope define register usage 
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PROGRAMMER'S INFORMATION 
DMA PROCESSOR 



D.1 DMA Descriptor 

The DMA descriptor is a command to the DMA processor, it causes a DMA to be 
executed, and after successful completion a number of event locations to be set A 
number of fields in the descriptor define control of the DMA, these are: . . . 

DMASize 

The number of bytes required to be written to destination. 

SourceAddress 

Virtual address indicating where in the local store the bytes are to be read from. 

DestAddress 

virtual address indicating the start address of where the bytes are to be written. This 
address is in the memory space of the destination virtual process. 

DestProcess 

To allow DMAs to act across the network DMA nominates a virtual process number 
of the destination process. This will be translated in the context of the descriptor. If 
the destination process is negative the DMA happens locally and the the destination 
event is set in the local memory space. 

LocalEvent 

After successful completion of the DMA this event location is set Null is taken to 
nominate no event location. 

DestEvent 

Nominates virtual address of second event to be set on destination process. 
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D.2 



Context 

On the queue this indicates the context the DMA is to be run in locally. When a 
descriptor is created the context is not included, but is added by the enqueueing 
process. This is done by taking the current context, shifting up the type field which 
must appear in the lower 16 bits, and ORing it in. The completed descriptor is then 
added to the queue. The current context is taken from the CurrContext register for the 
enqueueing process. This is a security feature disabling users from faking contexts 
and is used by the other process queues. When on the queue the context of the queue 
entry is in the lower 16 bits of the first descriptor word. 

DMA Queue 

The DMA queue is physically addressed, and all DMA block descriptors are aligned 
to 32 byte boundaries. Descriptors on the queue conform to the format below. 

D.2.1 DMA Descriptor 



Type | Context 

DMASize 

SourceAddress 

DestAddress 

LocalEvent 

DestProcess 

DestEvent 

SPARE 



DMA Type 

Type [1:0] Data type, Byte, Short, Word, Double Word (0,1,2,3) 

T^pe [2] The set location given is on the destination process. 

Type [8:3] DMA Opcode 

T^ype [14:9] Fail count How many packets can fail before DMA is trapped. 

Type [15] Reserved 

D.2 2 DMA Descriptor ELAN 1.2 

DestProcess must be negative -5 to give previous local DMA behaviour. This frees 
up process number as a valid process number. 
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D 3 DMA Status Information 

The state of the DMA processor visible to the main processor is, 

Status Register 
Current Descriptor 
DMA\_FPTR 
DMA\_COUNT 

The status register, DMA_FPTR ande DMA_COUNT give information about the 
state of the DMA queue. The state of a descriptor being worked on is held in the 
internal register as detailed in Appendix N. This descriptor is only updated as the 
DMA is definitely done. The descriptor read from the internal registers on a trap may 
detail some part of the DMA which has already been done. 

D.4 DMA Processor Opcodes 

operation opcode 

d_dma_secure 000 
d_dma 001 

All other opcodes trap as unimplemented DMA instructions. 
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COMMAND PROCESSOR 
INSTRUCTIONS 



E.l Commands 



Each command can be executed with a different context selected from the command 
context table. 



E.1.1 Queue Commands 



Places item pointed to by address onto the queue. The address is assumed to be 
virtual and is translated through the MMU using the indicated context The address 
is assumed to point to a 32byte aligned block. The first word is shifted up discarding 
the top 1 6bits, the context, from the table is ORed in such that user code cannot insert 
contexts. 

Notes 

The Remote/Local source bit for the DMA descriptor is not altered from the value in 
the descriptor passed in. 

The context is filled in from the vector of contexts. 



cp_REPLY_CO 6*h04 

cp_REPLY_Cl 6'h05 

cp_REPLY_C2 6'h06 

cp_REPLY_C3 6'h07 



cp_run_C0 


6'h08 


cp__run_Cl 


6'h09 


cp_run__C2 


6'hOa 


cp_run_C3 


6'hOb 


cp_DMA_C0 


6'hOc 


cp_DMA_Cl 


6'hOd 


cp_DMA_C2 


6'hOe 
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cpJDMA__C3 6'hOf 

E.1 2 Read Comms Processor Registers Commands 

Internal and external register reads and RmW can be performed through the command 
port The address of the register is taken from the immediate field directly. In this 
way all the comms processors registers can be accessed. 

cpjfceadjntemal 6'hlO 

cpJtaWJntemal 6*hl4 

cpJRead_extemal 6'hl8 

cpJRmW__extemal 6'hlc 

E.1.3 Event Commands 

Data register defines address of an event in current context 

cp_SETEVENT_CO 6'h20 

cp_SETEVENT_Cl 6*h21 

cpJSETEVENT_C2 6*h22 

cpJSETEVENT_C3 6*1123 

cpJXEAREVENT_CO 6'h24 

cpJXEAREVENT.Cl 6*h24 

cpJXEAREVENT_C2 6'h24 

cp_CLEAREVENT_C3 6*h24 
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TIMER, ALARM AND HUSH 
PERIPHERALS 



F.l Clock 



F.2 Ahum 



The timer is a memory mapped peripheral which resides on the Elan processor device. 
This peripheral can create interrupts which are directed through the interrupt logic. 
Their are three parts to the timer a clock, an alarm and a hush location. 



nS time counter incremented in 200nS counts 

S time counter (Ox3b9acaOO comparator on nS counter) 



The clock provides a readable nS counter. This is implemented as two 32 bit registers 
the first counting in nanoseconds and the second incrementing in seconds. The nano- 
seconds count is arranged to zero itself at l,000,000,000nS and increment the Sec 
counter. The timer is normally read as a double word value. It can however be 
accessed as two word locations but aliasing will be observed. 

The counter increments in 200nS counts, from a 5MHz crystal. This is intended to 
provide a granularity of less than 10 instruction cycles in the system where the clock 
rateis40MHz. 



A 32 bit register which is decremented by 200 at 200nS intervals. If the value is or 
becomes negative (top bit set) an interrupt will be signalled. 
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R3 Hush Register 

The hush register is a memory mapped device, used to prevent the communications 
processor accessing the MBus. It is used to allow a host processor to atomically 
access data used by the comms processor. 

The hush register when written to 1 'bl, prevents the comms processor accessing the 
memory bus. If it is not written to 1'bO within approximately 0.5mS, or it is written 
to l'bl again before being reset then a HushError interrupt will occur and the comms 
processor is again allowed access to the memory bus. The hush bit however remains 
set for the interrupt routine to clear. The state of the comms processor hush register 
is readable by the main processor. 

Bit Hush bit 

Bit 1 HushError interrupt bit 

On a HushError the hush bit is cleared and the HushError interrupt bit set The comms 
processor can proceed but the interrupt routine is required to clear the interrupt bit 
The interrupt routine can be executed on the comms processor. 



mefcO S1002-10M102.04 62 



EXCEPTIONS, TRAPS, AND 
INTERRUPTS 



During operation of the Elan Communications processor various exceptions may 
occur. These exceptions are divided into two classes: processor exceptions, and 
interrupts. 

G.l Processor Exceptions 

Each of the six processors may meet an exception. This is indicated by a six bit code 
in the bottom bits of the processor's status register. These bits are ORed together to 
form an interrupt line in the interrupt register. In this way an exception can cause an 
interrupt on the the main processor or comms processor, and have a trap handler deal 
with the exception. 

Table of Exceptions by Processor on which they Occur 

Cmd Input Reply Thread DMA Code 

NOJTRAP 

DATAJ^CCESSJEXCEPTION 

OUTPUT JIWALID.PROCESS 

OUTPUT_INVALID_ROUTE 

OUTPUTJTIMEOUT 

EVElNrrj2UEUE_OVERFLOW 

EVEOTJNTERRUPT 

QUEUEJ)VERFLOW 

UNIMPLEMENTED 

PACKET_SEQUENCE_ERROR 

DPROC_FAIL_COUNT_ERROR 

IPROC_NACK_AFTER_ACK 

RPROC NACKED 



X 


X 


X 


X 


X 


0x00 


X 


X 




X 


X 


0x04 






X 


X 


X 


0x08 






X 


X 


X 


0x09 






X 


X 


X 


OxOB 




X 




X 




OxOC 


X 


X 




X 


X 


0x12 


X 


X 




X 


X 


OxOD 




X 




X 
X 


X 
X 


0x10 
0x11 
0x14 




X ' 


X 






0x06 
OxOC 
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NOTE 



The MEM_ADDRESS J*OT_ALIGNED exception maps onto a 
DATA_ACXESSJEXCEPTION with the fault type indicating an alignment error. 

G.2 Interrupts 

The communications processor generates a single external interrupt and an internal 
interrupt These are both maskable and have the same source interrupt register, the 
interrupt being the logical OR of the masked interrupt register. The interrupt register 
is made up from the following bits. 

BitO RESERVED 

Bit 1 command processor exception 

Bit 2 inputO processor exception 

Bit 3 inputl processor exception 

Bit 4 reply processor exception 

Bit 5 thread processor exception 

Bit 6 DMA processor exception 

Bit 7 (Alarm register < 0) See Appendix F 

Bit 8 Hush Time Out See Appendix F 

Bit 9 External Device Interrupt 1 

Bit 10 External Device Interrupt 2 

Bit 11 External Device Interrupt 3 

Bit 12 External Device Interrupt 4 

Bit 13 External Device Interrupt 5 

Bit 14 External Device Interrupt 6 

Bit 15 External Device Interrupt 7 
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Bit 16 (TProc&RProc&DProc) all halted 

Bit 17 Both inputters finished processing packets and halted 
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MMU USER GUIDE 



The Elan MMU is based on a paged MMU. The protections required over the ELAN 
network are different to those of a standalone memory system. 



H.1 MMU Fault Status Register Bits 

The following table defines the meaning of bits in the MMU FSR. 



Bit no Name 



31to29 


Not used 


28 


LastError 


27to22 


Not used 


21 


Swapped Since 




Last Access 


20 


BlockAccess 


19 


WordAccess 


18 


InstructionAccess 


17tol3 


Not used 


12 


Enor3 


11 


Error2 


10 


Errorl 


9to8 


Level 


7 


Event access 


6 


Remote access 


5 


WriteNotRead 


4to2 


Fault type 


1 


FAV 





Not used 



meaning 

read as zero 

any type of MBus Bus error 

read as zero 

Set if memory error was detected 

after a uCode swap. 

32 byte block access 

4 byte access 

Read of instructions 

read as zero 

MBus Error type 3 Uncorrectable 

MBus Error type 2 Timeout 

MBus Error type 1 Bus Error 

MMU level value. Zero if bus error 

Set if event access 

Set if Read or write for an inputter 

Set if a Write access 

MMU fault type (FT code) 

Fault address valid 

Read as zero 
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The following table defines the meaning of the MMU fault type bits in the MMU 
FSR. 

FT code Fault types 






Not used 


1 


Invalid address error 


2 


Protection error 


3 


Not used 


4 


Translation error 


5 


Access bus error 


6 


Not used 


7 


Address alignment Error 



H.2 Access permissions 

Access type is encoded as a three bit field: 

Bit means it is a write not a read. 

Bit 1 means it is remote not local. 

Bit 2 means it is occurring for event handling. 



There are two levels of local access permission: readonly and readwriteevent 
There are five levels of remote access permission: noaccess, readonly, readonly- 
oreventjeadwrite, readwriteevent Remote permission can never be higher than local 
permission, so the combined codes are: 
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Access 000 001 010 Oil 100 101 110 111 

Type local local remote remote local local remote remote 

read write read write evread evwrite evread evwrite 

Permission 



Null 

localRead 

read 

noremote 

remoteread 

remote write 

remoteevent 

remoteall 



fl 
f 



fl 

f 

f 

f 

f 



fl 

f 

f 

f 

f 

f 



fl 

f 

f 

f 

f 

f 



Key: f — fault 

fl -+ Acknowledge but ignore (used for broadcasts) 

Accesses marked as fl causes the memory cycle to fail but return a seperate error 
code. If an input process receives this error code it treats the operation as if it had 
succeeded and will cause an ACK to be sent The access does not occur. This is 
used to implement non-contiguous broadcast sets. This is required to enable the 
recombining ACK/NACK logic to work successfully. 



H.3 Flushing 



A 



The ELAN MMU is flushed by writing 32'hO to the FLUSH external register. Writing 
values other than zero will cause destructive operations to the TLB but without 
disabling the entries, these operations being reserved for testing the TLB. 



More for this section Please 
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ELAN CONTROL WORD 



1.1 Introduction 



Several control configuration parameters are required to be set in the elan processor. 
Some of these are relevant to the individual processes and appear in the respective 
status registers. Other overall control parameters are set in the Control Word, this 
document describes these parameters and their default behaviour. 

The control word bits have the following meaning: 

Bit [0] R/W MMUEnabled 

Active High. Reset to zero. 

Bit [1] R/W SER 

Software External Reset SER is connected to an output pin on the device an can be 
used as a single output bit 

Bit [2] R/W SIRout 

Causes internal reset of Elan processor. The control word is not reset except for this 
bit 

Software Internal Reset 

Bit [3] R/W CProcReset 

Resets CProc. Used during CProc exception trap handler to clear pending exception. 
Reset to zero 
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Bit [4] RO CProcError 

CProc has trapped. This is cleared using GProcReset, to enable the processor to be 
restarted the command port needs to be emptied by reading and saving the active 
commands. 

Bit [5] R/W HaltOthers 

Setting this bit will halt the three queue processors, rproc, tproc and dproc. The halt 
points for the three processors are the same, Wakeup Always AND the output's not 
open OR Wakeup Runnable, if the processor gets de-scheduled and the above is true 
the processor will NOT be rescheduled till HaltOthers is written low. 

Reset value is zero. 

Bit [6] R/W Haltlnputters 

Setting this bit will halt both inputters The halt points for the both processors are 
the same, process EOR ie Only trap when not in middle of packet, if the processor 
gets de-scheduled and the above is true the processor will NOT be rescheduled till 
Haltlnputters is written low. 

Reset value is zero. 

Bit [7] RO Other sHalted 

When the HaltOthers bit is set this flag indicates whether the processes have halted. 
Only when ALL three processes have halted in the state required by the halting 
function will this bit be set 

Bit [8] RO InputsHalted 

When the Haltlnputters bit is set this flag indicates whether the processes have halted. 
Only when BOTH input processes have halted in the state required by the halting 
function will this bit be set 

Bit [10:9] R/W PerfCont 

Performance Meter control, to enable in situ speed testing of the Elan processor 
the device has a number of ring oscillators which can be multiplexed to drive the 
Seconds counter instead of the FiveMegQock input These oscillators are designed 
to each exercise a different parts of the silicon processing of the Elan chip. The rate 
of oscillation will vary from device to device, and for a particular device temperature 
and supply voltage. 
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Reset value is 00. 




PerfCont 


Function 


Tick Rate 


00 


Normal Gock 


1 Second 


01 


NAND 


< 2.942 uS 


10 


NOR 


< 2.982 uS 


11 


INV and track 


< 7.863 uS 



The NAND and NOR functions are ring oscillators built from NAND into INV and 
NOR into INV primitives respectively. The NAND function will test the strength of 
the n-type transistors and the NOR function the p-type transistors. 

The INV and track test is simply inverters driving near maximum ratio track load. 
This test calibrates the drive to track load for the process. 

The given tick rates are the maximum simulated. For devices within process 
specification the tick rate will be less than the above. 

Bit [11] RO LinkOlnReset 
Bit [12] RO LinkllnReset 

These bits output the Reset state of the link. If a link is in reset for any reason, then 
its LinkNInReset bit will be set. 

Bit [13] R/Clearable LinkOError 
Bit [14] R/Clearable LinklError 

These bits will become set if either a line data error or a line phase error is detected. 
The error is cleared by writing a ' r to the same bit location. 

Bit [15] R/W SyncCProc 

If this bit is set then before each command is executed a check is made to ensure that 
there are no outstanding MBus errors. This will guarantee that when wakeup never is 
written to a status register the value returned by the ext reg read mod write includes 
all possible traps. Trap code will not update the status register after the ext write. 

Bit [31:16] Reserved 

Read as zero. 
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COMMUNICATIONS PROCESSOR 
MEMORY MAP 



The communications processor memory map is divided into two parts, slave device 
addresses and external data structures. 

The slave locations appear in the Mbus mapping for the Han device. These locations 
control functionality and give access to the clock and alarm registers. 

The external data structures are areas of store used by the Elan when running for 
queue and trapping information. 

J.l Slave Device Locations 

All slave device locations are fixed with respect to physical address OxFFiOOOOOO, 
where i is the 4 bit MBus ID driven onto the Elan ID pins. The Elan does not respond 
to all addresses in this range. Those which are unmapped will cause a MBus timeout 
when accessed across the MBus. 

J.l.0.1 Configuration and ID registers 

OxFRFFFFFC Mbus Device ID Read Only 
OxFHFFFFFO Control register Read/Write 

The Mbus Device ID defines the implementation and revision of the device. This 32 
bit value has 4 fields defined by the MBus specification. 



Bits Wide 


Name 


Arev 


BRev 


CRev 


16 


Implementation 











8 


Device type 


9 


9 


9 


4 


Revision 





1 


2 


4 


Vendor code 


Oxf 


Oxf 


Oxf 



The MBus Device ID for the C revision Elan is therefore 0x00000921 



ItekD S1002-10M102.04 72 



Appendix J Communications Processor Memory Map 



J.l.0.2 Interrupt ID registers 

Main processor interrupt mask 

Single word of store 

0xFFiFFFFE8 
Interrupt register 

Single word of read only store 

0xFFiFFFFD8 

Comms processor store objects 
Command port 

256K area of store 

Mappable to 4K boundaries providing 63 commands 
one 

in each page + page (user read only) which 
provides 

info Single page non cacheable 

0xFFiF80000 

Clock Hi/Lo 

Two words of store 

OxFFiFFFFCO Seconds word access R/W 

0xFFiFFFFC8 nS word access R/W 

OxFFiFFFFEO {Seconds, nS} double word access, 
READ ONLY 

Note little endian addressing is same as big 
endian 

Hush register 

4K area of store (User read/write/ remote) 
Single page non cacheable 
OxFFiFFEOOO (Read/write) 

Alarm register 

Single word decrementing counter 
OxFFiFFFFDO 
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J.2 Main Store Used by Comms Processor 

J.2.1 Queues and Exception Areas 



The QUEUE_STORE_B ASE register defines the area of store used by ELAN. This 
register is is split into two fields, the upper 27 bits, QUEUE_STOREJBASE[31:5] 
are masked and shifted up 4 bits to become the 36 bit physical address, 
Q_STOREJBASE[35:0]. The lower 5 bits QUEUE__STORE_BASE[4:0] are used 
in to produce QJSIZE defined as (1 « QUEUE_STOREJBASE[4:0]). Q.SIZE 
denotes the size of the processor queues in units of the queue descriptor size ie 32 
byte chunks. 



Memory Usage 

Reply Queue 
Thread Queue 
DMA Queue 

Command Exception 
Input Exception 
Input 1 Exception 
Reply Exception 
Thread Exception 
DMA Exception 

Example: 



Size(Bytes) Start Address 

Q_SIZE*32 QJSTOREJBASE + 

Q_SIZE*32 Q_STORE_BASE + (Q_SIZE * 32) 

Q_SIZE*32 QJJTOREJ3ASE + 2 * (Q_SIZE * 32) 

16 O.STORE3ASE - (16 * 1) 

16 Q_STORE_BASE - (16 * 2) 

16 Q_STORE_BASE - (16 * 3) 

16 Q_STORE_BASE - (16 * 4) 

16 Q__STORE_BASE . (16 * 5) 

16 QJSTOREJBASE - (16 * 6) 



After reset the internal register QUEUE_STORE_BASE say is set to 0x045. This 
gives values to Q_STORE_BASE of 0x400 and QJSIZE of 0x20. Each queue then 
has a size of IK bytes and the first queue, that of the reply processor, appears at the 
address 0x400. The exception area is just below this address. These values in the 
above equations give the following addresses in the first 4K of store. 



Memory Usage 

DMA Exception 
Thread Exception 
Reply Exception 
Input 1 Exception 
Input Exception 
Command Exception 
Reply Queue 
Thread Queue 
DMA Queue 



From 

0x0000003a0 
0x0000003bO 
0x0000003c0 
0x0000003d0 
0x0000003e0 
0x0000003f0 
0x000000400 
0x000000800 
OxOOOOOOcOO 



To 

0x0000003af 
0x0000003bf 
0x0000003cf 
0x0000003df 
0x0000003ef 
0x0000003ff 
0x0000007ff 
OxOOOOOObff 
OxOOOOOOfff 
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A 



J.2.1.1 Queues Structures 

A queue item has an entry size of 32 bytes aligned on 32 byte boundary. 

The processor queues are accessed by common routines which check for overflow and 
contextO overflow. The only exceptions which can be generated by these routines are 

QUEUEJDVERFLOW, DATA_ACCESSJ2RROR. 

The three types of descriptor have the following format 

These should be removed from processor descriptions 

J22 Translation Tables 

Comms Process ID tables. 

Context Virtual Base and Size Tables 

2 words per context (8 * 64k bytes => 512K) 
Virtual Process Tables 

(one per active context) 

(Sparse and indirected) 
MMU translation tables. 

Root table (64K * 4) => 256K bytes 

pointed to by context table ptr 
Page Tables (Sparse and indirected) 

Route tables. (One of) Variable (4M max) 
Size defined by contents of Virtual process 
tables . 

Base defined by register 

53,3 Internal Interrupt Event Vector 

Size is 32 events which are all pairs of words. Base of vector is denned by internal 
register INTERNAL_INTERRUPT_BASE. Each vector element corresponds to an 
internal interrupt 
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J.3 Memory Map of Slave Word Devices 

OxFRFFFFFC Mbus Device ID 



0xFFiFFFFF8 

0xFFiFFFFF4 

OxFFiFFFFFO 

OxFFiFFFFEC 

0xFFiFFFFE8 

0xFFiFFFFE4 

OxFFiFFFFEO 

OxFFiFFFFDC 

0xFFiFFFFD8 

0xFHFFFFD4 

OxFFiFFFFDO 

OxFFiFFFFCC 

0xFFiFFFFC8 

QxFFiFFFFC4 

OxFFiFFFFCO 

OxFFiFFEOQO 

OxFRF80000 



Readonly 

Mbus Device ID Read Only 

Control register Read Only 

Control register Read/Write 

Main processor interrupt mask Read Only 

Main processor interrupt mask Read/Write 



Interrupt register Read Only 

Interrupt register Read Only 

Alarm register Read Only 

Alarm register Read/Write 

Clock Lo Read Only 

dock Lo Read/Write 

Clock Hi Read Only 

dock Hi Read/Write 

Hush register 4K area of store Read/Write 
Command port 256K area of store Read/Write 



J.4 Memory Map of Slave Double Word Device 



0xFFiFFFFF8 Mbus Device ED (twice) Read Only 

OxFFiFFFFFO Control register (twice) Read Only 

0xFFiFFFFE8 Main interrupt mask (twice) Read Only 

OxFRFFFFEO Clock (Hi,Lo) Read Only 

0xFFiFFFFD8 Interrupt register (twice) Read Only 

OxFFiFFFFDO Alarm register (twice) Read Only 

0xFHFFFFC8 Clock (LoJ-o) Read Only 

OxFFiFFFFCO Clock (Hi Jfi) Read Only 
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MEIKO BYTE-WIDE LINK 
LINE-PROTOCOL 



K.1 Link Connection 



The basic characteristics of Meiko links is that they aie; byte wide; bidirectional; 
point to point; and high bandwidth (>50 MBytes/s each direction). Each link consists 
of 20 wires; 10 for the input port, and 10 for the output port Each port has one clock 
wire and nine data lines. On both positive and negative transitions of the ChanClkln 
wire the Chanln wires are sampled. The output port sets up a new data pattern on 
ChanOut at the start of each communications clock period, and toggles Chandkout 
in the middle of each period. 



output port 

ChanClkOut 
ChanOut [8] 



input port 

ChanClkln 
Chanln [8] 



ChanOut [0] 



Chanln [0 ] 



K.2 Link Values Encoding 

The line protocol has eight command values, as well 256 data values encoded. No 
single bit error can change data into a command or visa-versa. No single bit error can 
change one command into another. 

The commands are as follows: 



Command Code 

NULL {3'h7,3'h0,3'h0} 
GAP {3'h7,3'hl,3'hl} 

SOP {3'h7,3'h2,3'h2} 



Usage 

Nothing to be sent. 
Used for bit stuffing to get 
receiver in sync with sender. 
Start of packet. 
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EOP {3'h7,3'h3,3'h3} 

TOKEN {3'h7,3'h4,3'h4} 

PNACK {3'h7,3'h5,3'h5} 

PACK {3'h7,3'h6,3'h6} 

RESET {3'h7,3'h7,3'h7} 



End of Packet. 

Receiver can accept 16 more 

bytes of data. 

Packet Not Acknowledge. 

Packet Acknowledge. 

Sender is in reset. 



The order of priority of sending commands and data is shown below: 

Highest priority Lowest priority 

RESET PACK TOKEN EOP Data GAP NULL 



PACK 
PNACK 



TOKEN EOP 
SOP 



The outputer attempts to output a GAP every 256 cycles. If having waited 128 cycles 
a GAP has still not been sent because the line has been continuously busy, then a 
GAP is sent in preference to data. When a GAP command is transmitted, it must be 
followed by a NULL command. The NULL following a GAP is higher priority than 
everything except the RESET command. 

The data bytes are encoded in four ranges, as follows: 

8'hOO - 8'h3f have the value {3'b000, Data[5:0]} 

8'h40 - 8'h7f have the value {3'bOOl, Data[5:0]} 

8'h80 - 8'hbf have the value {3'b010, Data[5:0]} 

8'hcO - 8'hff have the value {3'blOO, Data[5:0]} 

At least two of the top bits would have to change before the data byte could possibly 
be interpreted as a command byte. Errors which change the data values into other 
data values must be detected by error checking the packet contents at the packets 
destination. 

Packets are made up of route bytes, a SOP, one or more transactions, and one or two 
EOPs. One PACK or PNACK is returned for each packet sent The line protocol does 
not distinguishin any way between packets which have been PACK or PNACKed. If 
a packet is terminated prematurely by a line data error, the packet is terminated with 
an SOP EOP. This signals to the inputer that the packet was incomplete. 

RESET, TOKEN, GAP and NULL are only used by point to point links and are 
invisible to higher levels of protocol. 
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K3 Flow Control 



The input port has a FIFO. Data bytes and EOP commands can be stored in the FIFO 
while a switch or inputer is unable to take the data. The protocol allows the FIFO to be 
as deep as necessary to take up the delays in the line, but in the first implementations 
of links it is intended that the FIFO be 48 bytes deep. The size of the byte count 
register must be sufficient to cope with the largest FIFO it can be connected to, which 
may be greater than its own FIFO size. In the initial implementation this will be 8 
bits, allowing FIFOs up to 256 bytes to be connected. 

Any time an input port has 16 or more bytes of space in its FIFO, it instructs its output 
port to send a TOKEN command and decrements its space available count by 16. TTiis 
effectively transfers the ownership of 16 bytes of FIFO space from the inputer to the 
outputter connected to it Each byte consumed by the inputer frees up one byte of 
space in the FIFO. Note that the effective size of the FIFO is reduced by between 
and 15 bytes at any point in time because of the 1 6 byte granularity of the token. 

When an inputer receives a TOKEN command it instructs its paired output port to 
increment the count of the number of bytes it may send by 16. Each time the output 
port sends a byte it decrements this count by one. As long as the count is greater than 
zero the output port is allowed to transmit data, or commands that consume FIFO 
space, (EOP or SOP). 

Data is transmitted as packets. All packets must end with an EOP command. All 
packets must be either PACKed (Packet Acknowledge), or PNACKed (Packet NOT 
Acknowledge). Acknowledgements are passed back along the route that the packet 
took. If the outputting processor traps while it has its output open it will immediately 
send an EOP. An EOP generated in this way is termed an unsolicited EOP. If an 
Elite switch chip detects an error, then it will generate a BAD EOP command (this 
is an SOP immediately followed by an EOP command). A BAD EOP can be 
generated before a PACK/PNACK and hence be interpreted as an unsolicited EOP. 
The system must ensure that any PACK or PNACK being returned for that packet 
is not interpreted as being for a following packet To ensure this, unsolicited EOP 
commands are handshaken in the following way. The EOP command can be issued 
before a PACK/PNACK has been received, but another packet cannot be transmitted 
along the line before the PACK/PNACK is received. The line is kept open for a 
packet until both the EOP is sent and the PACK/PNACK is received. If a receiver 
port receives an EOP before the transmitter has sent a PACK/PNACK then a PNACK 
is automatically sent by the transmitter. Any PACK or PNACK commands received 
after an EOP has gone by, and before another packet has started, are deleted. 
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K.4 Links and Reset 

A link is held in reset when: 



1 The Reset pin of the chip is high. 

2 A JTag port holds the link in reset. 

3 The input port is not receiving a clock from the line. 

4 The input port is receiving RESET commands from the line. 

5 The outputter has been disabled by a JTag port. 
A link is put into reset for at least 256 clock cycles when :- 

1 The value clocked in on Chanln is neither a command nor Data. 

2 The inputer has a phase error. (Due to excessive drift, jitter or double 
clocking on the ChanlnClk) 

3 The link needs to be cleared because of a timeout 

4 A JTag port clears the link. 

When a link is put into reset the link has a defined state. Thisis:- 

1 No packets are being sent in either direction. 

2 No PACK/PNACK is outstanding. 

3 The flow control FIFO is empty, but the receiver owns all the space in 
the FIFO. i.e. TOKEN commands must be sent before any data can be 
received. 

4 The transmitters count of bytes it may send is set to zero. 

5 Any packets being sent when the link was put into reset are completely 
consumed, and if possible, NACKed. 

6 Any packets being received when the link was put into reset are ended 
with a BAD EOP (SOP, EOP); 
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7 The link is forced to output the RESET command, except if the link is 

reset by receiving a RESET command. If the link is receiving the RESET 
command then the transmitter sends the NULL command. 

If an outputeris in the midst of sending a packet when it is put into reset, the remainder 
of the packet is consumed but not transmitted, and a PNACK returned to the sender. 
The outputter will consume the rest of the packet up to the EOP, even if the link is 
taken out of reset New packets arriving at a link which is in reset are held until the 
link is taken out of reset 

If an inputer is receiving a packet when it is reset it forwards a SOP, followed by 
an EOP. If this occurs while route information is being sent this forces the message 
to terminate. If the error occurs during the data part of the packet the SOP is passed 
in to the inputing communications processor, which detects it as an error which does 
not require acknowledgement. 

This mechanism insures that reset can be forced at any time on a port in a way which 
can be detected and recovered from by all devices (either processors or switches) 
using that port The reset mechanism will always reset the link in both directions; 
this is essential as no direction can be deduced from an erroneous command or data 
item. Reset is however only sent in one direction across the link, form the end that 
detected the error. Resets are propagated along the currently connected routes so that 
an entire blocked packet is flushed out 

K.5 Clock Skew Tolerance 

The output port generates both the data and the Clock. Any variation in voltage, 
temperature, or process, should not cause the skew between the clock and data, as 
seen by the receiver, to vary. The data is clocked on both positive and negative edges 
of the clock. Therefore the maximum frequency of any pin is half the peak data rate. 

K.6 Clock Phase Locking and Control 

Both directions of a link must transmit at the same frequency. To avoid having to 
distribute a global clock, marginal (<200ppm) frequency variations are permitted. 
Receivers use the same frequency as their transmitters, and so have an unknown, 
and slowly changing phase difference with respect to the data coming in on the line. 
Inputters can correct for this by inserting or removing NULL commands. The points 
at which corrections can be made are signalled by GAP commands. GAP commands 
must be sent often enough to insure that the maximum frequency drift is always 
compensated for before it can cause phase errors. 
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Each receiver has a phase detecting circuit and a short FIFO. Data is clocked into the 
FIFO using the clock sent with the data. The data is clocked out of the FIFO using 
the receivers local clock. The FIFO is three entries deep. The minimu m possible 
latency through the FIFO is zero cycles, and the maximum possible latency through 
the FIFO is three cycles. Hie receiver monitors the latency through the FIFO, and 
tries to maintain a 1 to 2 cycle latency. At regular intervals, the sender transmits 
a GAP command. The GAP command is always followed by a NULL command. 
When the receiver receives a GAP then, if the measured latency is greater than two, 
because of clock drift or transmission delay drift, then in one cycle the receiver can 
remove both the GAP and the NULL from the FIFO. This will reduce the latency 
though the FIFO by one cycle. If, when the receiver receives a GAP, the measured 
latency is less than one cycle, then the receiver does not take anything out of the FIFO 
for one cycle. This will increase the latency by one cycle. 

K.7 Automatic Link Output Tri-state 

The link has an automatic link output tri-state function. This is included to enable 
hot insertion of circuit boards within a switch network. When a link is operating 
normally, it will be receiving an edge on the ChanClkln pin every clock cycle. If the 
link is disconnected then the ChanClkln will stop oscillating. A very weak pull down 
resistor on the ChanClkln pin will pull the input to ground. If the ChanClkln pin is 
read as zero without being read as a one for at least 256 Comms cycles, then all the 
output pins of the link out will be tri-stated. The ChanClkOut pin has a weak (but 
not very weak) pull up resistor. So if a board is re-inserted, and the link connection 
made again, when the power is restored the ChanClkln pins of both ends of the link 
will be read as a one (because the weak pull up resistor wins over the very weak pull 
down resistor). When the ChanClkln pin has been read as a one for more than 10ms 
the link output pads will be taken out of tri-state. While a link is in tri-state, the link 
is held in reset. Links will be automatically tri-state and untri-state regardless of the 
state of Chip Reset The only exception is if the link is being boundary scanned using 
the TAP. In this case the link will be forced out of tri-state. 
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There is one status register for each of the six processes implemented on the 
microengine. Each of these status registers is mapped as an external register. This 
may be read/written via the command port at the external register addresses detailed 
in Appendix M "Elan External Registers". 

On exceptions the status register, at the point at which the error is taken, is written 
to the exception area for the respective process. The position the exception areas are 
defined in Appendix J "Communication Processor Memory Map". 

Bit 31 OutputOpen. (Read Only) 

This bit means that the processor has the output channel open. This can only be 
set for the reply, DMA, or thread process. On an error, the microcode closes any 
open output devices with an EOP error. If this bit is set in the Status Register 
as written to the exception area it means that the processor had the output open 
when the error occurred. 

Bit 30 Reserved 

Bit 29 Reserved 

Bit 28 TrapOnTransBit 

This bit only has meaning for the input processors. When this is set each 
transaction will cause a trap to the main processor, to allow single stepping 
through a packet A trap will occur on each subsequent received transaction. 

Bit 27 Runable. (Read Only) 

For queued devices such as the thread, reply and DMA processors, this means 
that die device may have more work to do. Runnable for these devices means 
that QueueReady is set, and at least one of the Run bits is set On seeing 
Runnable the processor will attempt to run the top item of the queue. If the 
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queue is empty or the top item is a non context process and only context are 
runnable then the QueueReady bit is cleared. 

Runnable for the command processor means there is a command waiting in one 
or other of the command registers or an internal interrupt is pending. 

Runnable for the input processors means there is a transaction waiting to be 
processed. 

Bit 26,25 RunNonContextO, RunContextO 

These two bits determine the run-ability of any request to the processors. Each 
queue item or input transaction has a context associated with it, the action will 
only be executed if the context type matches with one of these bits which is set 

For queued devices these are used to evaluate whether or not to run the top item 
of the queue. 

For the inputters these bits cause the automatic Nacking of the respective 
transaction types. 

If these bits have to be changed then the following procedure should be used :- 

The status register should be written with the wakeup function set to never using 
cp^RmW.extemal. The new value of RunNonContextO and RunContextO 
is then inserted in the returned value and then written back again using 
cpJRmW_external with the original wakeup function. For this to work 
correctly on iprocO and iprocl the SyncCProc bit should be asserted in the Elan 
control register. 

TheRunNonContextO and RunContextO bits have no meaning for the command 
processor. 

Bit 24 QueueReady 

This bit is set any time a process is put on the queue. Any time the main 
processor asserts one of the Run bits it should also reassert QueueReady, so that 
the processor can check whether this has caused the queue to become Runnable. 
Note that this bit DOES NOT MEAN THE QUEUE IS EMPTY/FULL. It is 
simply an indication to the microengine to attempt to run the item at the top of 
the queue. 
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Bit 23 Reject - inputters only (Read Only) 

Reject is turned on automatically when a Nack is sent before an AckNow bit 
has been detected in a packet This causes all further transactions in a packet 
to be thrown away. 

Bit 22 AckSent - inputters only 

An Ack has been sent for a packet This is important for the input exception 
handlers as it means that any transactions which occur after the Ack has been 
sent must be handled by the exception handler, or if they are thrown away, a 
higher level protocol must insure retransmission. 
Readonly. 

Bit 21 BadCRC - inputters only 

Transaction with a BadCRC has been received. This causes an exception. If no 
Ack has been sent Reject will be turned on. Exception handler must restart the 
device by writing appropriate wakeup function and suspend address to device. 
Readonly. 

Bit 20 SOP error - inputters only 

Packet was terminated by reset or another SOP (SOP EOP means end of packet 

error). 

Readonly. 

Bit 19 ProcessingPacket - inputters only 

This bit will be set in the status register for all transactions other than EOP. ie 
when TrapOnTrans is asserted it can be used to indicate that the EOP has been 
received. 
Readonly. 

Bit [18:16] WakeupFn 

Valid functions are: 

WakeupNever Sleep 

1 WakeupAlways Swapped out but going to run again 

2 WakeupRunable Waiting for queue item 
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Following for outputing processors only: 

3 WakeupOutputReady Waiting for output and at least two trans space. 

4 WakeupAckNAckOrTimeout Waiting for Ack on output 

5 WakeupTwoTransSpace Waiting for Space 

Programming note. It is possible for a memory error to cause a process to trap with 
the wakeup function set to WakeupAckNAckOrTimeout If this is seen then the trap 
code must write to that processes local external QearAckNAck location so that the 
AckNAck buffer that was being used by processes for outputing a packet is freed up 
again. 

Bit [15:6] SuspendAddr 

This is the suspend address of the micro-process which is controlling the 
processor. Valid suspend addresses are denned in L.l. The field is 10 bits 
wide of which Elan 1.0 only uses 9 bits. 

Bit [5:0] TrapType 

Meaning of TrapType is discussed in Appendix G on "Exceptions". 

The reset state of the status registers is all zero except with the run contextO bits 
enabled, and the inputters enabled. 

L.1 Micro-Process Suspend Addresses 

The implementation of the six Elan processes is as six micro-processes timesharing 
the same micro-processor. Each executes micro-code from a ROM, to achieve high- 
density and functional integrity they share areas of this ROM. 

It is necessary for trap handling code to sometimes alter the SuspendAddr, which 
is the micro-code pointer to where execution will restart. Only certain micro-code 
addresses are valid restart points, each having a different usage. These are listed in 
the table below. This is achieved by changing the SuspendAddr field in the respective 
status register. 

AddrJSfullMWord 0x001 

Suspend point for idle process. Setting a process to this value and runnable 
will cause it to idle but at expense of lower priority processors which will not 
be able to execute at all. 

On an exception the SuspendAddr is set to this value by exception microcode. 
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Addr_DequeueDMA 0x010 

Reset value for DMA Processor. Place to restart DMA after removing 
descriptor. DMA processor will begin executing descriptors off the DMA 
queue. 

Addr_AfterSwapSPARCEntry 0x018 

Timeslice point while executing thread code. Thread processor timeslices 
between each instruction. PC, nPC, condition code bits and registers are all 
valid. 

AddrJDequeueThread 0x020 

Reset value for Thread Processor. Place to restart thread after removing 
or deleting executing descriptor because of exception. Processor will begin 
executing descriptors off the thread run queue. 

AddrJExecuteCommand 0x028 

Reset value for command processor. Only place to restart command processor. 

Addr__DequeueReply 0x030 

Reset value for reply Processor. Place to restart reply processor after removing 
or deleting executing descriptor because of exception. Processor will begin 
executing descriptors off the reply queue. If the reply buffer is marked as valid, 
it will be used as the first reply descriptor. 

Addr_NoThereIsnt 0x034 

Reply suspend value while waiting for output to become free or for enough 
space to appear in output buffer during execution of a descriptor. All reply 
descriptors are read into the reply buffer during their execution. 

AddrJDoReply 0x036 

The reply processor waits here when it has dequeued a descriptor into the reply 
buffer and is waiting for the outputter to have enough space in its buffer, or 
ownership of the buffer. 



mekO S1002-10M102.04 87 



Computing Surface 2 



Addr_DoDmaLoopRead 0x0 6c 

The dma processor waits here, between transactions on a dma packet sent into 
the network, for output buffer space to become available and to let higher 
priority processes in. 

Addr_WakeupForOpen 0x040 

The thread processor waits here for space in th output buffer or ownership of 
the buffer. 

Addr_ReplyCheckAck OxOee 

Reply processor suspend address while awaiting an acknowledge in response 
to a packet. It is unlikely that this value will be seen, as the link logic will 
always supply an acknowledgement (packet Ack, NAck or timeout). However 
if this value is seen, and rproc has been stopped, then the trap code must write 
to rprocs local external DearAckNAck to free off the Ack buffer. 

Addr_CheckAckNAckTimeout 0x106 

DMA processor suspend address while awaiting an acknowledge in response 
to a packet It is unlikely that this value will be seen, as the link logic will 
always supply an acknowledgement (packet Ack, NAck or timeout). However 
if this value is seen, and dproc has been stopped, then the trap code must write 
to dprocs local external QearAckNAck to free off the Ack buffer. 

Addr__SendDMATransactions OxlOe 

The dma processor waits here for output buffer space and higher priority 
processes before preparing the first transaction of a dma packet to be sent into 
the network. 

Addr__DoLoopRead 0x127 

The dma processor waits here during memory memory DMAs to let higher 
priority processes in. 

Addr_WakeupForSendTrans 0x130 

The threads processor waits here for enough output buffer space before 
executing a Send trans instruction. 
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AddrJDMASentOK3 0x15b 

The dma processor waits here for the outputer to become free between packets 
of a multi packet network dma. 

Addr_HandleTrans cation 0x163 

Normal wakeup address for inputters. Inputter will wake up a valid transaction 
appears in the input buffer. 

Addr_MoveToNext Transact ion 0x167 

Inputter will wake up and move to the next transaction in the input buffer. 

Addr__WakeupForClose 0x176 

Thread processor suspend address while awaiting an acknowledge in response 
to a packet If this value is seen, and tproc has been stopped, then the trap code 
must write to tprocs local external Clear AckNAck to free off the Ack buffer. 

Addr_DoSendDMAEOP 0xla5 

The dma processor waits here for enough buffer space to send the eop of a dma 
packet 

Addr_SendNullTransaction Oxlbb 

DMA processor is waiting to send a null transaction at the end of packet with 
AckNow bit set to ensure an acknowledge has been sent 

Addr_NAckAndMoveToNextTransaction 0xle3 

Inputter will wake up and attempt to send a packet NAck in response to the 
current packet If successful the pointers are correctly moved on in response 
to the current transaction. If the packet has already been acknowledged a trap 
will occur. 

Addr_DMASentOk Oxlfl 

The dma processor waits here to let other processes in on a memory memory 
dma before updating the descriptor. 
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AddrJResetMWord 0x000 

This microword is not a suspend address but is used in conjunction with the 
NullMWoid by the idle microengine process. 

Addr_InputterEntry 0x008 

Reset value for inputters. Inputters power up in this state but with Runnable set 
to Always. This location resets the input buffer pointers and sets the wakeup 
function to Runnable, then suspends itself at Addr_HandleTranscation, ready 
to receive input transactions. 
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The following is a list of the external register values. The external registers are 
accessed by a 32 bit read and 32 bit write bus. Not all of the registers return 32 bit 
values. Where this is the case the extra bits are returned as zero. As for the internal 
registers some of the registers can be accessed locally. The local access address being 
formed with bits [7:5] coming from the processor identification number ProcessED. 
The read and write addresses are both 8 bits wide, for external register reads however 
bit [4] is ignored. Global accesses can be made to the external registers, these being 
made at addresses 0x00 - Oxlf. 

The external registers may have different read and write behaviour. 

M.1 External Register Definitions 

The following tables list the external register definitions. The tables consist of 2 
columns: the first column defines the registers meaning when it is written to; the 
second column defines its meaning when it is read. The two meanings may, or may 
not, be the same 

M.l.l Global External Registers addresses (0x00 - Oxlf) 

NoWrite 0x00 VAddrReg 0x00 

No write occurs if a write is Virtual address register, 

made to this address. Maps nowhere. 

ResetOutput 0x01 CurrAT 0x01 

Reset the outputter logic. Current MBus access type 

InputContextFilter 0x02 R/W location 

Bits [15:0] read and write the 
16 bit value of the input context 
filter. If inputs with this value 
of context are received they 
will be NACKed automatically. 
The NACKing will begin on the next 
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packet received. 

ReplyBuf ferValid 0x03 R/W location 

Bit[31] indicates validity of reply buffer. 



CurrContext 0x04 

Bits[15:0] read and write the 
current value of the hardware 
context being used by the MMU. 



R/W location 



LineAddr 

line address access to TLB 



0x05 



BootModeAndLineAddr 0x05 



LineData 

32 bit line data currently 
pointed to by TLB. 



0x06 



R/W location 



Flush 0x07 

Invokes flush function on TLB 



TlbAccessFault 0x07 

Reads TLB access fault type. 



WriteBlockSize 
WriteDataPtr 
ExtMemOataReg 
ReadDataPtr 

PhysAddrLo 
PhysAddrHi 



36 bit physical address register. Writing to the high part 
causes the top 32 bits to get written and the lower four bits 
to be zeroed. Writing the lower part merely writes the bits 
[31:0] of the physical address register. Reading the high part 
returns the bits [35:4] and the low part bits [31:0]. 



0x08 


R/W location 


0x09 


R/W location 


0x0a 


R/W location 


0x0b 


R/W location 


0x0c 


R/W location 


OxOd 


R/W location 



SetMagicBits 

Sets written data bits in 
magic field 



OxOe 



Last Access 



OxOe 



Read last memory access 

Bit allocation is 

((MMUs FSR as for mem error) « 3) 

| (uCode ProcID initiating access) 



CleaxMagicBits 
Clears written data bits in 



0x0 f MemException 0x0 f 

Read type of memory exception 
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magic field. 



PhysWordWrite 


0x10 


No read mapping 


PhysWordRead 


0x11 


No read mapping 


PhysBlockWrite 


0x12 


No read mapping 


PhysBlockRead 


0x13 


No read mapping 


TLBWrite 


0x14 


No read mapping 


TLBRead 


0x15 


No read mapping 



Do physical memory operation. 



VAddrRegReadByte 


0x18 


No read mapping 


VAddrRegWriteByte 


0x19 


No read mapping 


VAddrRegReadWord 


0x1a 


No read mapping 


VAddrRegWriteWord 


0x1b 


No read mapping 


VAddrRegReadBlock 


0x1c 


No read mapping 


VAddrRegWriteBlock 


Oxld 


No read mapping 


VAddrRegEventWord 


Oxle 


No read mapping 


VAddrRegLocalWriteWord Oxlf 


No read mapping 


Set virtual address register with given access type 



M.2 Locally Mapped Registers 

M.2.1 Command Processor Externally Mapped Registers (0x20 
-0x3f) 



WakeupFunction 



0x27 



Processld 



0x27 



Bits[2:0] writes wakeup function 
into status register of Cproc 



Read process ID of Command 
processor ie 0x00000001 



CommandData 



0x28 



R/W location 



32 bit register used by 
command port to transfer data. 



CommandFinished 



0x29 



CommandPort 



0x29 



indicate to command port that 
command processor has finished 
the current command 



Command from command port 
Bit [31] indicates that an 
internal interrupt is pending 



No Write mapping 



MaskedlnterruptReg 0x2c 
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Interrupt register bit wise ANDed 
wife internal interrupt mask 



CommsProcIntMaskReg 0x2d 



R/W location 



18 bit Internal interrupt mask register. 

StatusReg 0x2f R/W location 

32 bit Status register for 
Cproc processor. Fields within 
status register are defined 
in Appendix L 

M.2.2 IprocO Externally Mapped Registers (0x40 - 0x5f) 

InputReplyBufferlnc 0x40 InputReplyBufferPtr 0x40 



Writing will cause input reply 
buffer to be incremented if it 
is not owned by Iprocl. 



Bits [7:0] read value of current 
input reply buffer pointer. Bit [31] 
reads if input reply buffer is in 
use by Iprocl. 



InputReplyBufferReset 0x41 
Resets input reply buffer pointer logic. 



CondTransResult 



0x42 



Send ACK or NACK depending on value of previous subtract 
Used to make network conditional fast 



SendNAck 



0x43 



Send a NACK to the processor from whom we are currently receiving. 

CurrContextAndlnputCode 0x44 InputFirstCase 0x44 

32 bit write value which sets the Reads microcode case value for 
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current MMU context for IprocO and transaction being decoded given 

transaction code being decoded. state of IprocO. 



TransactionUsed 



0x46 NextTransFront 



0x46 



The input transaction has been 
processed 



Bits [7:0] read value of pointer 
to next transaction in IprocO 
buffer. 



WakeupFunction 

Bits[2:0] writes wakeup function 
into status register of IprocO 



0x47 Processld 



0x47 



Read process ID of IprocO 



StatusReg 



0x4 f R/W location 



32 bit Status register for IprocO 
processor. Fields within status register 
are defined in Appendix L 



M.Z3 Iprocl Externally Mapped Registers (0x60 - 0x7 j) 



InputReplyBuf ferine 



0x60 InputReplyBuf ferPtr 0x60 



Writing will cause input reply buffer 
to be incremented if it is not owned 
by IprocO. 



Bits [7:0] read value of current 
input reply buffer pointer. Bit [31] 
reads if input reply buffer is in 
use by IprocO. 



InputReplyBufferReset 0x61 
Resets input reply buffer pointer logic. 



CondTransResult 



0x62 



Send ACK or NACK depending on value 
of previous subtract. Used to 
make network conditional fast 
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SendNAck 



0x63 



Send a NACK to die processor from 
whom we are currently receiving. 



CurrContextAndlnputCode 0x64 InputFirstCase 



0x64 



32 bit write value which sets the 
current MMU context for IprocO and 
transaction code being decoded. 



Reads microcode case value for 
transaction being decoded given 
state of Iprocl. 



TransactionOsed 



0x66 



NextTransFront 



0x66 



The input transaction has been 



Bits [7:0] read value of pointer 
to next transaction in Iprocl 
buffer. 



WakeupFunction 

Bits[2:0] writes wakeup function 
into status register of IprocO 



0x67 



Processld 



0x67 



Read process ID of Iprocl 



StatusReg 



0x6f R/W location 



32 bit Status register for Iprocl 
processor. Fields within status register 
are defined in Appendix L 

M.2.4 Rproc Externally Mapped Registers (0x80 - 0x9f) 

SendRouteBytes 0x80 OutWordBackAnd2TransSpace 0x80 



Makes output logic send route bytes. 
This together with SendTransaction 
and SendEOP delimit the sending of 
transactions and the EOF. 



Bits [7:0] pointer to output buffer 
space. Bit[31] indicates whether 
there is enough free space for 
another two transactions. 



SendTransaction 0x81 



0utByteBackAnd2TransSpace 0x81 
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Makes output logic send transaction. 



Bits [7:0] pointer to output buffer 
space. Bit[31] indicates whether 
there is enough free space for 
another two transactions. 



SendEOP 



0x82 



No read mapping 



Makes output logic send EOP. 



No write mapping 



CheckAckNAck 



0x83 



Read output packet status. Bit[31] 
indicates whether the packet has 
ended. The following read values 
are possible, 

0x00000000 Ack/Nack not yet received 
0x80000001 Ack received 
0x80000004 Packet timed out 
0x80000005 Nack received 



Clear AckNack 



0x84 



No read mapping 



Clears the Ack/Nack status. This 
frees up the logic which buffers 
die Ack/Nack and allows its use 
by another output packet This is 
the final step in sending a packet 
and acknowledges the reading of 
the CheckAckNAck value. 



ReserveTrans 



0x85 



No read mapping 



Reserve the space for a transaction 
to be sent in the output buffer. 



ReplyBufferPtr 0x88 



R/W location 



Returns pointer into reply buffer. 
Writing to this location 
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writes the pointer. Only bits[3:l] 
get written. 



No write mapping 



ReplyBufferValidTrue 0x8b 

Bit[31] indicates whether reply 
buffer is valid. 



StatusReg 



0x8 f 



32 bit Status register for Rproc 
processor. Fields within status 
register are defined in Appendix L 



R/W location 



M.25 Dproc Externally Mapped Registers (OxcO - Oxdf) 



SendRouteBytes OxcO 

SendTransaction Oxcl 

SendEOP 0xc2 

ClearAckNAck 0xc4 

ReserveTrans 0xc5 



OutWordBackAnd2TransSpace OxcO 
OutByteBackAnd2TransSpace Oxcl 



CheckAckNAck 



0xc3 



Output controls, as for reply processor 



WakeupFunction 



0xc7 Processld 



Bits[2:0] writes wakeup function 
into status register of Dproc 



Read process ID of DMA 
processor 0x00000006 



0xc7 



DmaResetTimeSlice 0xc8 DmaDoTimeSlice 



0xc8 



Reset DMA timeslice indicator 



Indicates whether or not the 
DMA timeslice period has been 
exceeded. 



StatusReg Oxcf R/W location 

32 bit Status register for Dproc 
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processor. Fields within status 
register are defined in Appendix L 



DmaWriteBlockSize 


0xd8 


SkipPos 


0xd9 


WriteDataPtrDma 


Oxda 


ReadDataPtrDma 


Oxdb 


SourceDestDif f2to0 Oxdc 


FlushPrevReg 


Oxdd 


DmaDataType 


Oxde 


TLBBlockReadDma 


Oxdf 



These registers control the DMA logic. 

M2.6 Tproc Externally Mapped Registers (OxaO - Oxbf) 



SendRouteBytes OxaO 
SendTransaction Oxal 
SendEOP 0xa2 



Clear AckNAck 
ReserveTrans 



0xa4 
0xa5 



OutWordBackAnd2TransSpace OxaO 
OutByteBackAnd2TransSpace Oxal 



CheckAckNAck 



0xa3 



Output controls, as for reply processor 



WakeupFunction 0xa7 

Bits[2:0] writes wakeup function 
into status register of Tproc 



Processld 

Read process ID of thread 
processor 0x00000005 



0xa7 



SC Instruction 



0xa8 



R/W location 



32 bit instruction register. Used by 
Tproc logic to decode instruction. 



SC InsStatus 



0xa9 



R/W location 



Bit[31] indicates validity of IN 
registers. BitfO] indicates whether 
IN registers are dirty. 
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SC cc 



Oxaa 



Bits[3:0] write Tproc condition 
code bits. 



SC_op_compat 



Oxab 



SC_rd 
SC ANNUL 



Oxab 
Oxab 



Controls execution of Tproc 
instructions on arithmetic 
or logic unit 



Returns number of destination 
register for current Tproc 
instruction in Bits[5:0]. Bit(31] 
indicates annul status if instruction 
is a Bice. 



Indicates that IN register block 
needs to be marked as dirty ie 
need to be written back to store 
if the current dest register is 
an IN register. 



SC_FirstCase 
SC BRANCH 



4'hc 
4'hc 



No write mapping 



Returns case value for Tproc 
instruction in bits [8:0]. Bit[31] 
holds branch status if instruction 
is a Bice. 



SC Immediate 



Oxad 



No write mapping 



Value of Immediate field for current 
Tproc instruction. 



SCJDirtylnsIfDest Oxae 



No read mapping 



StatusReg 



Oxaf 



R/W location 
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32 bit Status register for Tproc 
processor. Fields within status 
register are defined in Appendix L 



VAddrReglnstBlock Oxbc No read mapping 

The 32 bit address is to be 
treated as an instruction 
address for reading. 
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ELAN INTERNAL REGISTERS 



The following is a list of the internal register contents. The ELAN has 72 internal 
32-bit registers which can be accessed as either 32-bit values or 64-bit values. 

Each processor has an area of the internal register map which it can access locally 
without indexing. This is the area where most of the values used by a processor are 
stored. The entire register file can alternatively be accessed by indexing. The local 
accesses are made relative to the following local access constants. 

ThreadJLocals 0x20 

Reply__Locals 0x48 

Command_Locals 0x20 

DMA_Locals 0x10 

Input 0_Locals 0x12 

Inputl_Locals 0x14 

In addition to the local accesses, each processor can make global accesses to the first 
16 locations, OxOO-OxOf, without indexing. 



N.l Internal Register Definitions 

ROUTE TABLE PTR 0x00 



Pointer to base of route table. Individual entries given by 64 byte offset Each entry 
containing four 16 route byte. Pointer is upper 32 bits of a 36 bit physical address. 

CONTEXT_PTR 0x01 

Pointer to base of MMU context table. Entries in turn point to MMU entries for each 
context Pointer is upper 32 bits of a 36 bit physical address. 

QUEUEJSTOREJ3ASE 0x02 // Upper 
QUEUE SIZE 0x02 //Lower 
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Bits [31:5] form upper 27 bits of a 36 bit physical address, pointing to queue store 
base. This is area where the three queues for reply, thread and DMA processors are 
placed, together with the trap area. 

Bits[4:0] form shcnt where (1 « shcnt) is the number of queue entries for each 
processor. 

For example, 0x00000405 indicates that queues begins at address 0x000000400, and 
that each contains (1 « 5) ie. 32 entries. 



OUTPUT CONTEXT 



0x03 



If a processor has the output in use this is the value of context added to each 
transaction for the packet 



REPLY_FPTR 

REPLY_COUNT 

RUN_FPTR 

RUN_COUNT 

DMA_FPTR 

DMA COUNT 



0x04 // Upper 
0x04 // Lower 
0x05 // Upper 
0x05 // Lower 
0x06 // Upper 
0x06 //Lower 



These describe the state of the processor queues for each of the queued processors. 

Bits[31:16] form a 16 bit front of the queue index. Bits[15:0] form a 16 bit value 
giving the number of entries on the queue. 

DMA PACKET SIZE 0x07 



AandC_SAVE 


0x08 //Two words 


TEMP_A 


0x08 // Usable between external 


TEMP_C 


0x09 // memory accesses 


BandD_SAVE 


0x0a // Two words 


TEMP_B 


0x0a // Usable between external 


TEMP_D 


0x0b // memory accesses 


FREE__1 


0x0c 


FREE__2 


OxOd 


FREE__3 


OxOe 


FREE 4 


OxOf 
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These values are used for temporary space by all processors while executing. 

DMA_NoOf RdBytes 0x10 // valid when sending Pack 

DMA__OUTPUT — CONTEXT 0x10 // valid when not sending a packet 
DMAJtoOf WrBytes 0x11 // valid when sending Pack 

TEMPJCNPUT0 0x12 

Temporary register used by IprocO. 

DMAjTmpSource 0x13 // valid when sending Pack 

TEMP_INPUT1 0x14 

Temporary register used by Iprocl. 

DMAJTmpDest 0x15 

dmajroutej:acheo 0x16 

DMAJROUTEJZACHEl 0x17 

16 byte route cache used by DMA between packets to reduce memory accesses. 
When no DMA is running these registers are invalid. Used in addition with 
DMA_ROUTE_CACHE2 and DMAJfcOUTE_CACHE3. 

DMAJDESCJPYPE_CONTEXT 0x18 

DMAJDESC_SIZE 0x19 

DMA_DESC_SOURCE 0x1a 

DMAJDESCJDEST 0x1b 

DMA_DESC_EVENT 0x1c 

DMAJDESCJDESTPROC Oxld 

DMA__DESC_6 Oxle // Not used 

DMA__ROUTE__CACHE2 Oxle // DMA Tmp 

DMA_DESC_7 Oxlf // Not used 

DMA_ROUTE_CACHE3 Oxlf // DMA Tmp 

During DMA execution the 8 word DMA descriptor is held in these registers. This 
descriptor is not updated till the DMA has succeeded. The currently unused parts 
of the DMA descriptor, words 6 and 7 are used during remote DMAs as route cache 
words. 

GLOBAL_0 0x20 

SWAP PC 0x20 //When thread Swapped out 
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When thread processor is timesliced contains value of PC. During thread processor 
execution register is set to zero. When no thread is running this value is undefined. 

B_ADDR 0x21 

PC 0x22 

COMMANDJTEMP 0x22 // When thread Swapped out 

During thread processor execution register contains current PC. When thread isn't 
running it is used as a temporary value by the command processor. 

nPC 0x23 

During thread processor execution register contains current nPC. When thread isn't 
running it is undefined. 

COMMAND_CONTEXT_0_1 0x24 // Cmd 
COMMAND_CONTEXT_2_3 0x25 // Cmd 

Vector of four 16 bit contexts used by command processor in generating event or 
queue commands. Contexts are held litde endian in the words ie command context 
numberO appears in COMMAND_CONTEXT_0_1 bits [15:0] and command context 
number 3 appears in COMMAND_CONTEXT_2_3 bits [31:16]. 

CONTEXT 0x26 

Value of hardware context for current thread processor. When no thread processor is 
running this value is undefined. 

INTERNAL_INTERRUPT_BASE 0x27 // Cmd 

Points to base of internal interrupt event vector. Each entry containing two words 
long. Pointer is upper 32 bits of a 36 bit physical address. 



OUTS 


0x28 


OUTS 1 


0x29 


OUTS 2 


0x2a 


OUTS 3 


0x2b 


OUTS 4 


0x2c 


OUTS 5 


0x2d 


OUTS 6 


0x2e 


OUTS 7 


0x2f 
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OUT registers of thread processor. 



IB BUFFER 


0x30 


IB BUFFER 1 


0x31 


IB BUFFER 2 


0x32 


IB BUFFER 3 


0x33 


IB BUFFER 4 


0x34 


IB BUFFER 5 


0x35 


IB BUFFER 6 


0x36 



IB_BUFFER_7 0x37 

32 byte instruction buffer for thread processor. 



INS 


0x38 


INS 1 


0x39 


INS 2 


0x3a 


INS 3 


0x3b 


INS 4 


0x3c 


INS 5 


0x3d 


INS 6 


0x3e 


INS 7 


0x3f 



IN registers of thread processor. 

INPUT_REPLY_BUFFER_0 0x40 

INPUT_REPLY_BUFFER_1 0x41 

INPUT_REPLY_BUFFER_2 0x42 

INPUT_REPLY__BUFFERJ3 0x43 

INPUT__REPLYJBUFFER_4 0x44 

INPUT_REPLY_BUFFER_5 0x45 

INPUT_REPLY_BUFFER_6 0x46 

INPUTJREPLYJBUFFERJ7 0x47 

Space used by inputters to build up reply descriptor prior to placing it on the reply 
queue. 

REPLY_BUFFER_0 0x48 

REPLY__BUFFER_1 0x49 

REPLY_BUFFER_2 0x4a 

REPLY_BUFFER_3 0x4b 

REPLY_BUFFER : _4 0x4c 

REPLY__BUFFER_5 0x4d 

REPLY__BUFFER_6 0x4e 

REPLY BUFFER 7 0x4f 
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Space for a single reply descriptor. This is used to queue a reply if the reply 
queue is empty, so reducing MBus memory accesses. An external register location 
ReplyBufferValid indicates whether this buffer is in use. The buffer is also used 
during execution of a reply descriptor and so if the reply processor traps the 
ReplyBufferValid bit must be cleared explicitly by the trap handler. 

INPUTOJBUFFERJ) 0x50 

INPUT0_BUFFER_1 0x51 

INPUT0_BUFFER_2 0x52 

INPUT0_BUFFER_3 0x53 

INPUT0__BUFFER_4 0x54 

INPUT0_BUFFER_5 0x55 

INPUT0JBUFFER_6 0x56 

INPUT0_BUFFER_7 0x57 

INPUT 0_BUFFER_8 0x58 

INPUT0_BUFFER_9 0x59 

INPUT0_BUFFER_10 0x5a 

INPUT0J3UFFERJL1 0x5b 

INPUT0_BUFFER_12 0x5c 

INPUT0_BUFFERJL3 0x5d 

INPUT0_BUFFERJL4 0x5e 

INPUTOJBUFFERJ. 5 0x5f 

Input transaction buffer used for IprocO. 

OUTPUT_BUFFER_0 0x60 

OUTPUT_BUFFER_l 0x61 

OUTPUT_BUFFER_2 0x62 

OUTPUT JBUFFER_3 0x63 

OUTPUT_BUFFER_4 0x64 

OUTPUT_BUFFER_5 0x65 

OUTPUT_BUFFER_6 0x66 

OUTPUT_BUFFER__7 0x67 

OUTPUT_BUFFER_8 0x68 

OUTPUT_BUFFER_9 0x69 

OUTPUT_BUFFER_10 0x6a 

OUTPUT__BUFFER_ll 0x6b 

OUTPUT_BUFFER_JL2 0x6c 

OUTPUTJBUFFER_13 0x6d 

OUTPUT_BUFFER__14 0x6e 

OUTPUT_BUFFER__15 0x6f 

OUTPUT_BUFFER_16 0x70 

OUTPUTJ3UFFERJ.7 0x71 

OUTPUT BUFFER 18 0x72 
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0UTPUT_BUFFERJ.9 0x73 

OUTPUT_BUFFER~20 0x74 

OUTPUTJ3UFFER_21 0x75 

OUTPUT_BUFFER_22 0x76 

OUTPUT JBUFFER~2 3 0x77 

OUTPUTJBUFFER~24 0x78 

OUTPUT_BUFFER_25 0x79 

OUTPUT_BUFFER_26 0x7a 

OUTPUT_BUFFERJ27 0x7b 

OUTPUT_BUFFER_28 0x7c 

OUTPUT_BUFFER_29 0x7d 

OUTPUTJBUFFER_30 0x7e 

OUTPUT__BUFFER_31 0x7f 

Output buffers used by reply, thread and dma processors to formulate packets 
including route byte headers. Additionally the DMA processor uses the output buffer 
as a temporary storage during memory memory DMA. 

INPUT1_BUFFERJ) OxdO 

INPUT1_BUFFERJL Oxdl 

INPUT1JBUFFER_2 0xd2 

INPUT1J3UFFERJ3 0xd3 

INPUT1JBUFFER~4 0xd4 

INPUT1_BUFFER_5 0xd5 

INPUT1_BUFFER_6 0xd6 

INPUT1_BUFFER_7 0xd7 

INPUT1JBUFFER_8 0xd8 

INPUT1_BUFFER_9 0xd9 

INPUT1_BUFFER_10 Oxda 

INPUT1_BUFFER_11 Oxdb 

INPUT1_BUFFER_12 Oxdc 

INPUT1__BUFFER_13 Oxdd 

INPUT1_BUFFER_JL4 Oxde 

INPUT1J3UFFERJL5 Oxdf 

Input transaction buffer used for Iprocl. 
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OVERVIEW 



1.1 Introduction 

The Meiko Elite Packet Switch Chip has eight bi-directional byte wide Elan links, and 
an 8x8 cross-bar switch with broadcast capability. An Elite routes and checks Elan 
packets from any input link to one or more output link. An Elite can be intergrated 
and controlled via an IEEE Standard Test Access Port (TAP). Routing is controlled by 
protocol sent with the packets. Elite switches within a network attempt to implement 
a distributed fair round robin arbitration scheme across the whole network, giving 
all sending processors equal priority to each receiving processor. Each link has a 48 
byte flow control FIFo, allowing Elite switches to be connected by up to 25 meters 
of cable without any loss of bandwidth. 

1.2 Major Objectives 

The major objectives for the switch processor are: 

To have high link bandwidth. (Total switch bandwidth = 600 Mbytes/sec) 

To pass packets across the switch from link to link with minimum latency. 

To return PACK/PNACKs across the switch from link to link with minimum 
latency. 

To impliment broadcast communications. 

To impliment a switch network wide fair round robin arbitration scheme. 

To check the integrity of packets transmitted across the switch. 

To handle errors with minimum disruption to other packet traffic on the switch 
network. 



13 Elan Links 

The links are described in detail appendix A. 
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8X8 CROSS-BAR SWITCH 



2.1 Route byte format 

Packets entering a switch chip must start with a route byte. This is used to select an 
output, or group of outputs, to transmit the packet to. The route byte is the first byte 
of the packet as seen by the switch chip. The switch chip uses the route byte and then 
removes the route byte from the packet If the packet has to travel through a number 
of switch chips, then the route bytes for each switch chip are placed in the order they 
will be used as the packet moves from Elite switch to Elite switch. When the packet 
arrives at the destination processor or processors, all the route byte(s) will have been 
deleted from the front of the packet. The route byte has four fields as shown below:- 

RouteByte[2:0] Lower bound of output ports. 
RouteByte[3] Priority bit. This must be set for 

broadcast transmissions. 
RouteByte[6:4] Upper bound of output ports. 
RouteByte[7] Odd Parity bit. 

For normal point to point connections across a switch, both the upper and lower bound 
values must have the number of the output link. The priority bit is normally reset, 
and the parity bit is set to odd parity for the route byte. The whole packet will be sent 
to the addressed output link, and the PAck or PNAck, received at the output link, will 
be returned back across the switch and sent to the link that received the packet. 

e.g. 

1 Link 3 receives a packet with a route byte of value 0x9 1 . 

2 Both the upper and lower bounds have the value 1 . 
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The packet will be routed to Linkl, and the ack received by Linkl will be sent back 
outofLink3. 

For a broadcast connection across the switch the lower bound has the number of the 
lowest output link and the upper bound has the number of the highest output link. 
The priority bit must be set, and the parity bit is set to odd parity for the route byte. 

eg. 

1 Link 6 receives a packet with a route byte of value 0x49. 

2 The upper bound has the value 4. 

3 The priority bit is set 

4 The lower bound has the value 1 . 
The effect is :- 

• The packet will be routed and copied to Links 1 ,2,3 and 4. 

• When all the links 1 to 4 have received an ack (PAck or PN Ack) then an ack will 
be sent out of link 6. 

• If any of the links 1 to 4 receive a PNAck then a PNAck will be sent from link 6. 

• Only if all the links 1 to 4 receive a PAck will a PAck be sent by Link 6. 



2.2 Distributed round robin arbitration 



The Elite switch chip implements a fair 'switch network wide' round robin arbitration 
scheme. Within each switch the switch output for a link, when a connection is made, 
round robin arbitrates onto a switch input The priority of arbitration is as follows: 
After chip reset switch input is the highest priority and switch input 7 is the lowest 
priority. So if the first two packets arriving at the switch arrive at the same time 
and require routing to the same switch output, then the switch input with the lowest 
number is connected first On the next arbitration, the switch input that was last 
connected becomes the lowest priority, and the next switch input (last switch input 
+ 1) becomes the highest priority. The priority increases as the switch input number 
increases, wrapping from switch input 7 back to switch input 0. A switch output 
does not always re-arbitrate at the end of a packet All packets end with one End 
of Packet (EOP), but only packets ending with two successive EOPs allow a switch 
output to re-arbitrate. An Elan processor always ends a packet with two EOPs. As 
the packet moves through the switch network, the second EOP may be deleted by an 
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Elite switch. This will happen if another switch input is waiting to be connected to the 
switch output passing the packet If only one EOP is seen on the end of a packet, then 
the switch output will not re-arbitrate. In other words, two EOPs means the switch 
output may re-arbitrate, one EOP means the switch output may not The second EOP 
will not be deleted if the arbitration wraps from switch input 7 to switch input 0. e.g. 
if switch inputs 1 , 3, and 5 are all receiving back to back packets destined to the same 
switch output and each ending with two EOPs, then the order the switch inputs are 
selected is 1-3-5-1-3-5-1-3- 5... Packets from switch inputs 1 and 3 will 
have their second EOP deleted, but the packets received at switch input 5 keeps its 
second EOP as the arbitration at the end of the packet from switch input 5 involves 
an arbitration back to switch input 1 that wraps from switch input 7 to switch input 
0. If a packet is received at a switch input with only one EOP, and the next packet 
received at the same switch input is not to be routed to the same switch output, then the 
switch output adds a second EOP to the first packet so that subsequent Elite switches 
are not locked onto an idle switch input However, the second EOP is not added if 
another switch input is waiting to use the switch output following the rules above. 
The distributed round robin only functions correctly provided all the links involved 
are trying to communicate to a single Elan processor. 

23 Priority arbitration 

An additional round robin arbiter has been included to enable arbitration between 
incoming broadcast packets. If the priority bit (bit 3 in the route byte) is set then 
that switch input has priority over normal connections. This mechanism has been 
introduced to enable broadcast communications to operate correctly. If two broadcast 
communications arrive simultaneously at two switch inputs and the set of switch 
outputs that they are to connect to overlaps then it is probable that they will each 
connect to some, but not all, of their switch outputs and each be left waiting for switch 
outputs that the other has connected with. In this case the switch will deadlock. In 
order to prevent this, a priority request round robin arbiter has been added. When the 
priority bit in the route byte is set then before any requests to connect to any switch 
outputs is made a priority round robin request is made. When the priority request is 
acknowledged, all other switch inputs are inhibited from making their requests, and 
the acknowledged switch input is guaranteed to connect to all the switch outputs 
after any packets being passed those switch outputs has gone by. Re-arbitration 
will occur even if the packets being passed only end with one EOP. The priority 
bit must be set for broadcast communications but must never be set for point to point 
communications. 
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2.4 Broadcast communication 



A switch network made using Elite switch chips and connecting Han communication 
processors together can be viewed as a linear array of Elan processors numbered 
to n connected by a tree of Elite switch chips. Broadcasts are made by sending a 
packet to the top of the tree and then copying the packet to more than one switch 
output as the packet moves back down the tree. In this way it arrives at many Elan 
processors at the bottom of the tree. The broadcast set is contiguous in the range X 
to Y where <= X <= Y <= n. Each Elite switch has a number of links going up in 
the network the rest going down towards the Elan processors. The links going down 
always start with link 0. Broadcasts may only spread out going down the network. 
Each Elite has a value called its BroadcastTop. This should be set to highest link 
number going down. So, for example, if links to 3 go down and links 4 to 7 go 
up, then the BroadcastTop value should be set to 3. While an Elite is in reset the 
notEnor pin is tri-state. When an Elite comes out of reset the value on the error pin 
is sampled. If the value is high then BroadcastTop is set to 3. If the value is low then 
BroadcastTop is set to 7. The error pin can be pulled by a resistor to the appropriate 
level. Other values of BroadcastTop can be set by writting to the GlobaiReg using 
the JTag interface. The route bytes required to make a broadcast connection are as 
follows :- 

1 A set of normal point to point route bytes are required to send the packet 
to the top of the switch network tree. 

2 A set of broadcast route bytes back down the tree describe the contiguous 
set of Elan processors being communicated with. 

Each broadcast route byte has a 3 bit upper and a 3 bit lower bound number describing 
on which links the packet is to leave the Elite. The First broadcast route byte describes 
which links the packet should leave on the top most Elite switch chip within the 
switch network. As the network is a tree, and we require a contiguous range of Elan 
processors to be communicated with, then the middle downward links of this Elite 
should then pass the packet to every Elan processor on a downward path from these 
links. For this reason the top Elite modifies all subsequent route bytes for this packet 
(to be used by all the Elite chips below the top Elite switch chip) to say "output on all 
downward links". The value chosen for this is "broadcast 7 to 0". Zero is always the 
lowest downward link from any elite. Seven is the highest possible. When an upper 
bound of seven is received by an Elite on a broadcast route, then if the BroadcastTop 
value is not seven, then the higher bound is changed to the BroadcastTop value. The 
least significant link being broadcast to on the top Elite switch chip only requires the 
upper bound to be modified on all subsequent route bytes, and the most significant 
link being broadcast to on the top Elite switch chip only requires the lower bound to 
be modified on all subsequent route bytes. All other lower Elite switches apply the 
same rules. Most will receive a route byte with an upper bound of 7 and a lower bound 
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of 0. These routes cannot be modified further. The Elite switches on the edges of the 
downward formed broadcast pyramid will have to modify the route bytes leaving 
them that are no longer on the edge of the pyramid. So the broadcast route bytes sent 
by the sending Elan processor consist of a list of upper and lower bounds, where the 
upper bounds describes the most significant edge of the broadcast pyramid, and the 
lower bounds describes the least significant edge of the broadcast pyramid. As route 
bytes may be changed then the parity bit may also have to be changed if the new value 
of route byte requires it Note, the parity bit is not regenerated, it is only changed. 
Hence if the route byte was corrupted, then the new value will still say that the route 
is corrupted. 

The sending Elan processor is only allowed to send it's EOP when it receives either 
an PACK or a PNACK. A receiving Elan processor will only send a PACK when the 
'Ack now* bit is set within a transaction. An Elite switch is only allowed to send 
a PACK or a PNACK when it has received a PACK or PNACK on all of the links 
it sent the packet out of. A PACK will only be sent back from an Elite switch if a 
PACK was received on all of the links it sent the packet too. If any of the output 
links received a PNACK then when all of the other links have received a PACK or 
PNACK, a PNACK will be sent back to the sending Elan. The packet is held open 
within an Elite switch until an EOP is received from the sending Elan processor. The 
full sequence of events for a broadcast communication is as follows :- 

1 A set of normal point to point route bytes are sent by the sending Elan 
processor to take the packet to the top of the switch network tree. 

2 A set of broadcast route bytes are sent by the sending Elan processor 
to take and broadcast the packet back down the tree to all of the Elan 
processors being communicated with. 

3 A Start of packet (SOP) command is sent 

4 One or more transactions are sent One of them must have the 'Ack now' 
bit set 

5 When the transaction with the 'Ack now' bit set is received by the Elan 
processors then they return either a PNACK or PACK. 

6 When all the PNACKs and PACKs have been merged and returned back 
across the switch network a single PNACK or PACK is received by the 
sending Elan processor. 

7 Two EOP commands are sent by the sending Elan processor. 
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8 As the EOPs move across the Elite switches, the switch connections are 

disconnected and made available for other communications. 
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ERRORS, ERRORS HANDLING, 
AND RESET PROPAGATION 



Errors can be observed through an ERROR pin. The precise nature of the error can 
be found, and if necessary the error may be corrected, using the TAP. When the Elite 
detects an error, it will close and remove any effected packets, and if possible, will 
tell the sending and receiving Elan processors that an error has occurred. 

Errors on the switch chips fall into three categories 

1 Link line protocol errors. 

2 Routing and packet protocol errors. 

3 Timeout errors. 

Each of the eight links has an error flag register. If an error is detected, then the 
appropriate bit is set in the error flag register. All the error flag bits are ORed together 
for all the links to form an Elite error value that is output on the .ERROR pin. Once 
an error bit is set, it can only be reset via the Test Access Port All error bits are reset 
by Chip reset 

3.1 Link line protocol errors 

There are two line protocol errors that can be detected :- 

1 Received Phase Error. 

2 Received Data Error. 

A Received Phase Error occurs when the link receiver was unable to keep phase 
aligned to incoming link data. This may occur because of :- 

1 Excessive frequency drift between the transmitter and the receiver, 
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2 Excessive line clock jitter, caused by grid bounce etc. 

3 Noise on the line causing multiple clocks to be received. 

A Received Data Error occurs when the incoming data is not a valid command value 
or a valid data value. 

When a link line protocol error occurs, both the link transmitter and link receiver are 
put into reset for at least 256 clock cycles, and any packets being transmitted by the 
link are correctly terminated and if possible NACKed. 

3.2 Routing and packet protocol errors 

Elan packets consist of route bytes, a SOP, one or more transactions, then an EOP. 

The route bytes each have a parity bit All the route bytes entering an Elite have 
their parity checked. If any of the route bytes has a parity error then the route parity 
error bit is set and the packet is terminated. When the packet is terminated a NAck is 
returned to the sending Elan, the rest of the packet being thrown away, and aBad EOP 
(SOP EOP) is sent on to the receiving Elan. This may or may not reach the receiving 
Elan depending on which Elite switch and in which route byte the error occurred. 

Each transaction within a packet ends with a two byte CCITT CRC polynomial. This 
is used to check the integrity of the transactioa As a packet moves through an Elite, 
all the transactions are checked. The transaction CRC error bit will be set if either the 
CRC value does not match the transaction, or the transaction ends prematurely with 
either an EOP or a Bad EOP. The size of a transaction is defined in bits six to four of 
the first byte of the transaction. 

Total in bytes 
(Inc crc) 
10 
18 
26 
42 
74 
138 
266 
522 

Using the TAP the setting of the CRC error bitmay be disabled if the Elite is to be used 
for switching non standard Elan packets. Current Elan processors will only generate 
transactions of 10, 18, 26 or 42 bytes in size. If an error is detected and enabled then 
the rest of the packet sent to the Elite is consumed, a Bad EOP is sent on from the 
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First 


Byte 


Trans size 


[6:4] 


in 


words 









2 


1 






4 


2 






6 


3 






10 


4 






18 


5 






34 


6 






66 


7 






130 



Errors, Errors Handling.and Reset propagation 



Elite, and if a PACK or PNACK has not already been returned a PNACK is returned 
to the sending Elan processor. If the error is detected on the last transaction within 
the packet and the EOP of the packet immediately follows the last transaction, then 
it is possible for the packet to leave the Elite before a Bad EOP can be placed on the 
end of the packet However the transaction CRC error bit for that link input will still 
beset 

3 J Timeout errors 

There are two sources of timeout error for each link. They are :- 

1 Packet open timeout 

2 Packet waiting timeout 

An Elite switch input is considered open from the time that the first byte of a packet 
moves across the switch to the time that switch is able to re-arbitrate to a new input 
If this time exceeds 240 to 480ms for a 70MHz switch network then the switch will 
disconnect and reset the link connected to the output of the switch. It will also set the 
Input Open Timeout error bit for the link connected to the input of the switch. 

If, when a packet arrives at a switch input, the switch is unable to connect to the 
desired switch output because another switch input is connected to that switch output 
then that input is considered to be waiting. Each link input has a two bit Waiting 
timeout control register. This register controls the behaviour of the switch input if the 
input is waiting for longer than the waiting timeout value. The register is accessed 
using the TAP and is defined below :- 

1 Bit Enable automatic Nack and gobble. 

2 Bit 1 Enable setting of input open timeout error bit 

When bit is set then if the waiting timeout value is exceeded, the whole packet is 
consumed and a NACK is returned to the sending Elan processor. Nothing is sent 
on as the switch did not connect to any switch output When bit 1 is set then if 
the waiting timeout value is exceeded, the Input Open Timeout error bit for the link 
connected to the switch input is set If bit 1 is set on its own, then the only action is 
to set the error bit i.e. the packet is not terminated, and if the switch output that the 
input is waiting for becomes free, the packet is transmitted as normal. On chip reset 
both bits are cleared. 

The waiting timeout value is set by loading the appropriate three bit value into the 
TAP global register. The waiting timeout values for an Elite with a 70MHz comms 
clock is given below :- 
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Global reg[: 


2:0] 


Timeout period 







22-2 9us (Reset value) 


1 




44-58us 


2 




88-117uS 


3 




176-234uS 


4 




352-468uS 


5 




704-936uS 


6 




1408-1873us 


7 




2816-3745uS 



3.4 Reset Propagation 

If the output link on an Elan processor has been open for more than the timeout 
duration, then the Elan must clear the packet being communicated. This may have 
occurred for one of the following reasons :- 

1 One or more of the transactions in the packet where invalid. 

2 No * Ack now' bit had been set in any of the packets transactions. 

3 The routes used to send the packet where not valid. 

4 Part of the network to be used by the packet is broken. 

5 The PACK or PNACK was lost by a bad transmission. 

6 The inputer on the receiving Elan processor has been disabled. 

7 Erronious code on the sending Elan processor has not 'Closed' the packet 

8 Congestion of the switch network prevented the packet from reaching the 
receiving Elan processor within the timeout period. 

For any of the above reasons, and possibly other reasons, the packet must be capable 
of being cleared. The whole packet could be stored in the flow control FiFos of each 
of the links used by the packet When the Elan outputer times out itputsitsoutputing 
link into reset This causes the RESET command to be sent down link. If an Elite 
switch chip receives a RESET command, and is connected to one or more output 
links, then the reset value is propagated across the switch and causes all of the output 
links connected to the inputing link to put their transmitters into reset and transmit a 
single RESET command. The output will clear its output byte count and then ignore 
any TOKEN commands it receieves for the next 64 to 128 cycles. The output will 
continue to supply GAP commands for line syncronization and also will continue to 
return PACK and PNACK commands for any packets being sent down the link in the 
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other direction. If an elite receives a single RESET command then it clears its input 
fifo and will send on a reset to any outputs that the input is connected to. The input 
will then wait 256 to 512 cycles before returning and TOKENS back to the outputer. 
In this way the whole packet is cleared from the switch network. If the reset reaches 
the receiving Elan processor, then it is interpreted as a BAD EOP by the inputer of 
the Elan processor. 

The RESET command is only transmitted in one direction across a link. Any packets 
being transmitted in the other direction are not affected. If a packet times out, the 
whole of that packet is cleared from the switch network without affecting other 
packets apart from blocking other packets trying to use a link in the same direction 
as the timed out packet 

If a link is put into reset by either a hard data or phase error, or by the jtag or chip 
reset pin, then both the sender and receiver are put into reset and a timeout is sent in 
both directions. 

3.5 Disabled Links 

A link on an Elite switch chip can be disabled by sending a command on the TAR 
When a link is disabled, the link is put into reset and outputs a RESET command. 
Any packet that arrives to be transmitted out of the link from the switch is consumed 
and PACKed. For most errors it is normal to PNACK and consume a packet, but 
the disabled link PACKs. This is necessary if portions of the switch network are to 
be partitioned off and broadcast communications are still able to operate around the 
partitioned area. 



mekO DRAFT May 27, 1994 3*5 







3^6 



meko 



TEST ACCESS PORT (TAP) 



The Elite switch chip is control by and conforms to the TERR Standard Test Access 
Port (IEEE Std 1149.1-1990). The TAP has the full five pin interface of :- 

1 TCK Test Clock Input 

2 TMS Test Mode Select Input. 

3 TDI Test Data Input 

4 TOO Test Data Output 

5 TOST* Test Reset Input 

The instruction register is six bits wide. All the mandatory instructions BYPASS, 
SAMPLE/PRELOAD, and EXTEST are included. The optional IDCODE instruction 
is also included. Private instructions have been included that perform the following 
functions :- 

1 Global register access. 

2 Timer register access. 

3 Individual link error value register access. 

4 Individual link error bits register access. 

5 Individual link reset register access. 

6 Individual link timeout control register access. 

7 Equivalent instructions to SAMPLE/PRELOAD, and EXTEST for each 
of the eight links. 

8 Eight instructions to read link control state. 

The full list of instructions is given in appendix B. The full list of register bit allocation 
and meanings is given in appendix C. 
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MEIKO BYTE-WIDE LINK 
LINE-PROTOCOL 



A.1 Link Connection 



The basic characteristics of Meiko links is that they are; byte wide; bidirectional; 
point to point; and high bandwidth (>50 Mbyte/s each direction). Each link consists 
of 20 wires; 10 for the input port, and 10 for the output port. Each port has one clock 
wire and nine data lines. On both positive and negative transitions of the ChanClkln 
wire the Chanln wires are sampled. The output port sets up a new data pattern on 
ChanOut at the start of each communications clock period, and toggles ChanQkout 
in the middle of each period. 



output port 

ChanClkOut 
ChanOut [8] 



input port 

ChanClkln 
Chanln [8] 



ChanOut [0] 



Chanln [0] 



A.2 Link Values Encoding 

The line protocol has eight command values, as well 256 data values encoded. No 
single bit error can change data into a command or visa-versa. No single bit error can 
change one command into another. 



The commands are as follows: 

Command Code 

NULL {3'h7,3'h0,3'h0} 
GAP {.3'h7,3'hl,3'hl} 

SOP {3'h7,3'h2,3'h2} 



Usage 

Nothing to be sent. 
Used for bit stuffing to get 
receiver in sync with sender. 
Start of packet. 
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EOP {3'h7,3'h3,3'h3} End of Packet. 

TOKEN {3'h7,3'h4,3'h4} Receiver can accept 16 more 

bytes of data. 
PNACK {3'h7,3'h5,3'h5} Packet Not Acknowledge. 
PACK {3'h7,3'h6,3'h6} Packet Acknowledge. 
RESET {3'h7,3'h7,3'h7} Sender is in reset. 

The order of priority of sending commands and data is shown below: 

Highest priority Lowest priority 

RESET PACK TOKEN EOP Data GAP NULL 
PNACK SOP 

The outputer attempts to output a GAP every 256 cycles. If having waited 128 cycles 
a GAP has still not been sent because the line has been continuously busy, then a 
GAP is sent in preference to data. When a GAP command is transmitted, it must be 
followed by a NULL command. The NULL following a GAP is higher priority than 
everything except the RESET command. 

The data bytes are encoded in four ranges, as follows: 

8'h00 - 8'h3f have the value {3'bOOO, Data[5:0]} 

8'h40 - 8'h7f have the value {3'b001, Data[5:0]} 

8'h80 - 8'hbf have the value {3'b010, Data[5:0]} 

8'hcO - 8'hff have the value {3'bl00, Data[5:0]} 

At least two of the top bits would have to change before the data byte could possibly 
be interpreted as a command byte. Errors which change the data values into other 
data values must be detected by error checking the packet contents at the packets 
destination. 

Packets are made up of route bytes, a SOP, one or more transactions, and one or two 
EOPs. One PACK or PNACK is returned for each packet sent The line protocol does 
not distinguish in any way between packets which have been PACK or PNACKed. If 
a packet is terminated prematurely by a line data error, the packet is terminated with 
an SOP EOP. This signals to the inputer that the packet was incomplete. 

RESET, TOKEN, GAP and NULL are only used by point to point links and are 
invisible to higher levels of protocol. 
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A.3 Flow Control 



The input port has a FiFo. Data bytes and EOP commands can be stored in the FiFo 
while a switch or inputer is unable to take the data. The protocol allows the FiFo to be 
as deep as necessary to take up the delays in the line, but in the first implementations 
of links it is intended that the FiFo be 48 bytes deep. The size of the byte count 
register must be sufficient to cope with the largest FiFo it can be connected to, which 
may be greater than its own FiFo size. In the initial implementation this will be 8 
bits, allowing FiFos up to 256 bytes to be connected. 

Any time an input port has 16 or more bytes of space in its FiFo, it instructs its output 
port to send a TOKEN command and decrements its space available count by 16. 
This effectively transfers the ownership of 16 bytes of FiFo space from the inputer 
to the outputer connected to it Each byte consumed by the inputer frees up one byte 
of space in the FiFo. Note that the effective size of the FiFo is reduced by between 
and 15 bytes at any point in time because of the 16 byte granularity of the token. 

When an inputer receives a TOKEN command it instructs its paired output port to 
increment the count of the number of bytes it may send by 16. Each time the output 
port sends a byte it decrements this count by one. As long as the count is greater than 
zero the output port is allowed to transmit data, or commands that consume FiFo 
space, (EOP or SOP). 

Data is transmitted as packets. All packets must end with an EOP command. All 
packets must be either PACKed (Packet Acknowledge), or PNACKed (Packet NOT 
Acknowledge). Acknowledgements are passed back along the route that the packet 
took. If the outputing processor traps while it has its output open it will immediately 
send an EOP. An EOP generated in this way is termed an unsolicited EOP. If an 
Elite switch chip detects an error, then it will generate a BAD EOP command 
(this is an SOP immidately followed by an EOP command). A BAD EOP can be 
generated before a PACK/PNACK and hence be interprited as an unsolicited EOP. 
The system must ensure that any PACK or PNACK being returned for that packet 
is not interpreted as being for a following packet To ensure this, unsolicited EOP 
commands are handshaken in the following way. The EOP command can be issued 
before a PACK/PNACK has been received, but another packet cannot be transmitted 
along the line before the PACK/PNACK is received. The line is kept open for a 
packet until both the EOP is sent and the PACK/PNACK is received. If a receiver 
port receives an EOP before the transmitter has sent a PACK/PNACK then a PNACK 
is automatically sent by the transmitter. Any PACK or PNACK commands received 
after an EOP has gone by, and before another packet has started, are deleted. 
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A.4 Links and Reset 

A link is held in reset when: 



1 The Reset pin of the chip is high. 

2 A JTag port holds the link in reset 

3 The input port is not receiving a clock from the line. 

4 The input port is receiving RESET commands from the line. 

5 The outputer has been disabled by a JTag port 
A link is put into reset for at least 256 clock cycles when :- 

1 The value clocked in on Chanln is neither a command nor Data. 

2 The inputer has a phase error. (Due to excessive drift, jitter or double 
clocking on the ChanlnClk) 

3 The link needs to be cleared because of a timeout. 

4 A JTag port clears the link. 

When a link is put into reset the link has a defined state. This is :- 

1 No packets are being sent in either direction. 

2 No PACK/PNACK is outstanding. 

3 The flow control HFb is empty, but the receiver owns all the space in 
the FiFo. i.e. TOKEN commands must be sent before any data can be 
received. 

4 The transmitters count of bytes it may send is set to zero. 

5 Any packets being sent when the link was put into reset are completely 
consumed, and if possible, NACKed. 

6 Any packets being received when the link was put into reset are ended 
with a BAD EOP (SOP, EOP); 
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7 The link is forced to output the RESET command, except if the link is 

reset by receiving a RESET command. If the link is receiving the RESET 
command then the transmitter sends the NULL command. 

If anoutputeris inthemidstof sending a packet when it is put into reset, the remainder 
of the packet is consumed but not transmitted, and a PNACK returned to the sender. 
The outputer will consume the rest of the packet up to the EOP, even if the link is 
taken out of reset New packets arriving at a link which is in reset are held until the 
link is taken out of reset 

If an inputer is receiving a packet when it is reset, it forwards a SOP, followed by 
an EOP. If this occurs while route information is being sent this forces the message 
to terminate. If the error occurs during the data part of the packet the SOP is passed 
in to the inputing communications processor, which detects it as an error which does 
not require acknowledgement 

This mechanism insures that reset can be forced at any time on a port in a way which 
can be detected and recovered from by all devices (either processors or switches) 
using that port The reset mechanism will always reset the link in both directions; 
this is essential as no direction can be deduced from an erronious command or data 
item. Reset is however only sent in one direction across the link, form the end that 
detected the error. Resets are propogated along the currently connected routes so that 
an entire blocked packet is flushed out 

A.5 Clock Skew Tolerance 

The output port generates both the data and the Clock. Any variation in voltage, 
temperature, or process, should not cause the skew between the clock and data, as 
seen by the receiver, to vary. The data is clocked on both positive and negative edges 
of the clock. Therefore the maximum frequency of any pin is half the peak data rate. 

A.6 Clock Phase Locking and Control 

Both directions of a link must transmit at the same frequency. To avoid having to 
distribute a global clock, marginal (<200ppm) frequency variations are permitted. 
Receivers use the same frequency as their transmitters, and so have an unknown, 
and slowly changing phase difference with respect to the data coming in on the line. 
Inputers can correct for this by inserting or removing NULL commands. The points 
at which corrections can be made are signaled by GAP commands. GAP commands 
must be sent often enough to insure that the maximum frequency drift is always 
compensated for before it can cause phase errors. 
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Each receiver has a phase detecting circuit and a short FIFo. Data is clocked into the 
F1F0 using the clock sent with the data. The data is clocked out of the HFo using 
the receivers local clock. The HFo is three entries deep. The minimum possible 
latency through the FiFo is zero cycles, and the maximum possible latency through 
the FiFo is three cycles. The receiver monitors the latency through the FiFo, and tries 
to maintain a 1 to 2 cycle latency. At regular intervals, the sender transmits a GAP 
command. The GAP command is always followed by a NULL command. When the 
receiver receives a GAP then, if the measured latency is greater than two, because 
of clock drift or transmission delay drift, then in one cycle the receiver can remove 
both the GAP and the NULL from the FiFo. This will reduce the latency though the 
HFo by one cycle. If, when the receiver receives a GAP, the measured latency is less 
than one cycle, then the receiver does not take anything out of the HFo for one cycle. 
This will increase the latency by one cycle. 

A.7 Automatic Link Output Tri-state 

The link has an automatic link output tri-state function. This is included to enable 
hot insertion of circuit boards within a switch network. When a link is operating 
normally, it will be receiving an edge on the OianOkln pin every clock cycle. If 
the link is disconnected then the ChanOkln will stop oscillating. A very weak pull 
down resistor on the ChanOkln pin will pull the input to gnd. If the ChanOkln pin 
is read as zero without being read as a one for at least 256 Comms cycles, then all 
the output pins of the link out will be tri-stated. The OianOkOut pin has a weak (but 
not very weak) pull up resistor. So if a board is re-inserted, and the link connection 
made again, when the power is restored the OianOkln pins of both ends of the link 
will be read as a one (because the weak pull up resistor wins over the very weak pull 
down resistor). When the OianOkln pin has been read as a one for more than 10ms 
the link output pads will be taken out of tri-state. While a link is in tri-state, the link 
is held in reset Links will be automatically tri-state and untri-state regardless of the 
state of Gup Reset The only exception is if the link is being boundary scanned using 
the TAP. In this case the link will be forced out of tri-state. 
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ELITE TEST ACCESS PORT 
INSTRUCTIONS 



The Test Access Port instruction register is six bits long. During the Capture-IR state 
the instruction register is loaded with the following. 

{2'b00, ChipReset, ErrorFlag, 2'bOl} 

The following is a list all the instructions with the hex code and number of data bits. 



Hex 


No of 


Instruction Extra Function 


Code 


Data 
bits 


Name 


00 


164 


ExTest Also resets all links 


01 


164 


SamplePreload 


02 


11 


ReadAndWrtGlobalControl 


03 


11 


ReadGlobalControl 


04 


24 


ReadAndWrt Timeout Reg 


05 


24 


ReadTimeoutReg 


06 


72 


ReadErrorValRegs 


07 


72 


ReadErrorValRegs 


08 


16 


ReadAndWrt Re set Control 


09 


16 


ReadResetControl 


0a 


16 


ReadAndWrtTimeout Control 


0b 


16 


ReadTimeoutControl 


0c not used 


Od not used 



Oe 40 ReadAndClearErrors Clear individual error 

bits 
Of 40 ReadErrors Read all switch error 

bits 

Also resets link 

Also resets link 1 
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10 


20 


ExTestLinkO 


11 


20 


SamplePreloadLinkO 


12 


20 


ExTestLinkl 


13 


20 


SamplePreloadLinkl 



14 


20 


ExTestLink2 


15 


20 


SamplePreloadLink2 


16 


20 


ExTestLink3 


17 


20 


SamplePreloadLink3 


18 


20 


ExTestLink4 


19 


20 


SamplePreloadLink4 


la 


20 


ExTestLink5 


lb 


20 


SamplePreloadLink5 


lc 


20 


ExTestLink6 


Id 


20 


SamplePreloadLink6 


le 


20 


ExTestLink7 


If 


20 


SamplePreloadLink7 


20 




Bypass 


21 


15 


ReadStateLinkO 


22 




Bypass 


23 


15 


ReadStateLinkl 


24 




Bypass 


25 


15 


ReadStateLink2 


26 




Bypass 


27 


15 


ReadStateLink3 


28 




Bypass 


29 


15 


ReadStateLink4 


2a 




Bypass 


2b 


15 


ReadStateLink5 


2c 




Bypass 


2d 


15 


ReadStateLink6 


2e 




Bypass 


2f 


15 


ReadStateLink7 


30- 


3c 


Bypass 


3d 


6 


IDCode 


3e 




Bypass 


3f 


1 


Bypass 



Also resets link 2 

Also resets link 3 

Also resets link 4 

Also resets link 5 

Also resets link 6 

Also resets link 7 
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ELITE TEST ACCESS PORT 
REGISTERS 



The following is a list of Elite internal registers that can be accessed using the Test 
Access Port The registers are not all the same length. 

C.1 External scan shift path. 

The following is used by the instructions SAMPLE/PRELOAD, and EXTEST. 

Whole scan path is:- 

{Link7Scan[19:0], Link6Scan[19:0] , Link5Scan[19:0] , 

Link4Scan[19:0], Link3Scan[19:0] , _ERROR, _RESET, 

Link2Scan[19:0], LinklScan[19:0] , COMMSCLK, __COMMSCLK, 
LinkOScan[19:0] } 

Each link scan bit order is:- 

{Linkln8, Linkln7, Linkln6, Linkln5, Linkln4, 
LinklnClk, Linkln3, Linkln2, Linklnl, LinklnO, 
LinkOutO, LinkOutl, LinkOut2, LinkOut3, LinkOut4, 
LinkOutClk, LinkOut5, LinkOut6, LinkOut7, LinkOut8} 
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C.2 Link Switch State. 

Eight instructions exist to access read only state within each link. One instruction per 
link. Each register is 15 bits long. The bits are defined below :- 

Bit(s) 

6:0 RouteByte. Valid if waiting for connection. 

7 Switch input is connected and transmitting 
packet to one or more outputs. 

8 Switch input is waiting to be connected. 

9 Switch output is connected to a switch input. 

10 Switch output FirstEop. Waiting for next byte 
to decide if the output should rearbitrate to 
a new switch input. 

11 Receiver flow control fifo is not empty 

12 No more fifo space for link transmitter to 
send data to. 

13 Link output pads tri-state for power up. 

14 Link is in reset 

C.3 Link Reset Control. 

This is a 16 bit read/write register that is used to give individual link reset control. 
Two bits per link. The coding is as follows :- 

Value Meaning 

Normal operation 

1 Reset Switch input. 

i.e. Send reset on to all outputs connected 
to input, NAck and gobble the rest of the 
packet . 

2 Disable switch output. 

i.e. PAck and gobble all packets. 

3 Hold link in reset 

The bit ordering of the whole register is :- 

{RC7[1:0], RC6[1:0J, RC5[1:0], RC4[1:0], 
RC3[1:0], RC2[1:0], RC1[1:0], RC0[1:0]} 
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C.4 Waiting timeout control. 

This register is used to give individual link control to the behaviour of a switch input 
that has been waiting to connect to an output for more than the waiting timeout value 
defined in the Global register. The waiting timeout is a 16 bit read/write register with 
two bits per link. The meaning of each bit is given below :- 

Bit Enable automatic Nack and gobble. 

Bit 1 Enable setting of input open timeout error bit. 

The bit ordering of the whole register is :- 

{TC7[1:0], TC6[1:0], TC5[1:0], TC4[1:0], 
TC3[1:0], TC2[1:0], TC1[1:0], TC0[1:0]} 

The whole register is zeroed on chip reset. 

C.5 Error value register. 

This is a 72 bit read only register. It is split into 8 * 9 bit registers, one 9 bit register 
per link. Each register holds the value of the last data error received by the link. The 
bit ordering of the whole register is :- 

{EV7[8:0], EV6[8:0], EV5[8:0], EV4[8:0], 
EV3[8:0], EV2[8:0], EV1[8:0], EV0[8:0]} 

C.6 Error Flag Register 

This is a 40 bit read / bit clearable register. There are five bits per link. The error bits 
can be read. They are set to 1 if the error occurs, and are cleared by writing a 1 to the 
corresponding bit position from the TAP. Writing a to a bit from the TAP causes the 
value not to change. All error bits are cleared by chip reset The order of the error 
bits for each link is given below :- 

Bit Route Parity Error. Set by route parity error. 
Bit 1 Transaction CRC error. Set by an invalid 

transaction. 
Bit 2 Received Phase Error. Set if input was unable 

to keep phase aligned to incoming link data. 
Bit 3 Received Data Error. Set if received data was 

not a valid data byte or command. 
Bit 4 Input Open Timeout. Set if packet was connected 

to an output for too long, or if enabled the 

packet was waiting to be connected for too 
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long. 

Whole ieg is:- 

{Lnk7EF[4:0], Lnk6EF [ 4 : ] , Lnk5EF [ 4 : ] , Lnk4EF[4:0], 
Lnk3EF[4:0] / Lnk2EF [ 4 : ] , LnklEF[4:0], LnkOEF[4:0]} 

C.7 Global control register. 

This is an 11 bit read/write register. It is divided into 6 fields as follows:- 

Bit(s) 

2:0 Waiting Timeout Control value 

22-2 9us (Reset value) 

1 44-58us 

2 88-117uS 

3 176-234uS 

4 352-468uS 

5 704-936uS 

6 1408-1873us 

7 2816-3745uS 

4:3 Perf meter control. 

Perf meter is disabled, 

1 Perf meter is testing nand gates, 

2 Perf meter is testing nor gates, 

3 Perf meter is testing inv gates and track load, 

5 Mux perf meter output onto the error pin. 

6 Disable transaction checking. When set, 
transaction CRC error bits will not be set. 

9:7 BroadcastTop value. 

10 ChipReset. Ored with the __RESET pin to form 
the chip reset value. 
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C.8 Perf meter Count register and timeout register. 

This is a 24 bit read/write incrementing register. Under normal operation it is 
incremented every 256 Comms cycles. It is used to provide timeout periods for both 
open and waiting packets. It is also used to control the duration of tri-state after a 
link is reconnected. If the Perf meter control field of the global register is non-zero, 
then it is incremented by the Perf meter output. 
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